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Abstract 


A  subgrouped  version  of  the  Real  Time  Recurrent  Learning  (RTRL)  network  was 
written  in  C,  and  its  capabilities  were  evaluated.  Although  the  RTRL  net  architecturally 
consists  of  one  layer  of  neurons  it  successfully  learns  the  XOR  problem,  and  can  be  trained 
to  perform  time  dependent  functions  such  as  emulating  a  digital  low  pass  filter,  and 
internalizing  a  state  model  of  a  data  sequence.  The  net  was  tested  as  a  predictor,  to 
evaluate  it's  ability  to  predict  the  future  value  of  a  chaotic  signal  based  on  past  behavior. 
While  the  net  was  not  able  to  predict  a  chaotic  signal's  future  output,  it  tracked  the  signal 
closely.  The  net  was  also  tested  as  a  classifier  for  time  varying  phenomena;  for  the 
differentiation  of  five  classes  of  vehicle  images  based  on  features  extracted  from  the  visual 
information.  The  net  achieved  a  99.2%  accuracy  in  recognizing  the  five  vehicle  classes. 
Recognition  was  based  on  the  sequences  of  vector  quantized  codewords  which  represented 
feature  changes  caused  by  shifting  the  vehicle  image  aspect  over  time. 

The  various  operating  parameters  of  the  subgrouped  recurrent  net  program  (initial 
learning  rate,  momentum,  minimum  allowed  sigmoidal  derivative,  teacher  forced  learning, 
weight  update  error  threshold  and  continuity  of  recurrence  between  training  epochs)  were 
tested  for  their  impact  in  learning  performance,  as  applied  to  phoneme  group  classification 
and  a  low  pass  Butterworth  filter  emulation.  The  behavior  of  the  subgrouped  RTRL  net 
was  compared  to  the  RTRL  net  described  in  Capt  Randall  Lindsey’s  AFTT  Master's 
thesis(7).  Varying  the  net  operating  parameters  demonstrated  how  gains  in  network  error 
reduction  could  be  obtained,  and  the  subgrouped  RTRL  network  performance  proved  close 
to  the  RTRL  algorithm  in  accuracy  while  reducing  the  time  required  for  updating  network 
weights  during  training  for  a  multiple  output  (classification)  problem. 
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A  SUBGROUPED  REAL  TIME  RECURRENT 
LEARNING  NEURAL  NETWORK 

/.  Introduction 

Neural  networks  have  been  receiving  a  tremendous  amount  of  interest  lately,  not 
only  from  the  engineers  and  researchers  who  are  applying  them  to  solve  problems,  but 
from  the  non-technical  general  population  as  well.  They  are  often  likened  to  the  human 
brain,  learning  from  experience  to  solve  general  problems.  While  an  intriguing  analogy, 
any  attempt  to  imply  that  neural  networks  work  in  the  same  way  as  a  human  brain  is 
misleading.  Neural  networks  are  computer  algorithms,  many  forms  of  which  were 
inspired  by  the  apparent  method  in  which  neurons  process  information  in  biological 
systems.  New  variations  of  neural  networks  are  being  generated  continuously,  and  the 
best  type  of  neural  net  to  apply  depends  on  the  characteristics  of  the  problem  being 
solved. 

Many  of  the  problems  being  attacked  by  neural  networks  are  time  dependent,  i.e. 
the  pattern  learned  by  the  network  varies  over  time,  and  each  state  of  the  output  is  in 
some  way  dependent  on  information  processed  prior  to  that  point.  This  makes  it  essential 
to  know  what  happened  in  the  past  to  correctly  process  the  current  data.  To  solve  such 
tasks  with  neural  nets  requires  some  method  of  capturing  temporal  information. 
Recurrent  neural  networks  perform  this  feat  by  feeding  back  information  from  the  hidden 
and/or  output  nodes  back  into  the  network  inputs.  This  allows  the  network  to  see  the 
current  data  as  well  as  a  processed  version  of  prior  input  data. 
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The  addition  of  temporal  information  may  make  a  recurrent  network  better  at 
solving  problems  such  as  predicting  commodity  prices,  identifying  moving  targets  or 
identifying  the  different  sounds,  called  phonemes,  in  human  speech. 

1.1  Problem 

The  Real  Time  Recurrent  Learning  (RTRL)  network  is  a  recurrent  neural  net  that 
has  been  proven  to  be  able  to  learn  time  dependent  functions  such  as  tracking  analog 
signals,  imitating  a  digital  filter  and  recognizing  sequences  (17X7).  One  well  known  (20) 
limitation  of  the  RTRL  algorithm  however  is  the  level  of  computer  processing  required 
for  updating  the  weights,  which  is  on  the  order  of  ©(neurons4).  This  makes  large, 
multiple  network  output  problems  expensive  computationally  to  train,  and  in  some  cases 
impractical.  The  goal  of  this  thesis  was  to  determine  the  behavior  and  performance  of  the 
subgrouped  RTRL  network  described  by  Zipser  (20).  This  variation  of  the  RTRL 
algorithm  reduces  the  computational  requirements  for  training  the  network  for  multiple 
output  problems  requiring  larger  numbers  of  neurons. 

The  problem  faced  in  this  thesis  was  to  quantify  the  behavior  and  performance  of 
the  subgrouped  RTRL  network,  and  to  apply  it  to  problems  where  the  characteristics  of 
the  net  will  be  beneficial.  Because  the  subgrouped  RTRL  network  is  a  time  dependent 
neural  network,  it  was  applied  to  two  problems  with  inherent  time  dependencies  within 
the  data: 

A.  Predicting  the  daily  opening  values  of  the  pound  in  the  London 
Exchange  based  on  past  performance. 

B.  The  problem  of  classifying  images  based  on  sequences  of  vector 
quantized  data,  representing  aspect  or  point  of  view  changes  in  the 
observation  of  5  different  vehicles  over  time. 
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1.2  Background 

With  the  myriad  symposia,  conferences,  and  publications  currently  devoted  to 
neural  nets,  it  is  often  difficult  to  maintain  a  current  understanding  of  the  "state  of  the  art" 
in  neural  networks.  Not  only  are  new  forms  of  networks  continually  being  developed,  but 
the  more  established  neural  networks  (Cybenko,  feedforward,  Hopfield,  Adaptive 
Resonance  Theory,  etc.)  are  continuously  being  modified,  tweaked  and  improved  upon, 
creating  a  multitude  of  related  offspring.  This  thesis  will  focus  on  those  networks  that 
specifically  incorporate  time  as  part  of  the  processing  of  information,  and  particularly  on 
the  subgrouped  Real  Time  Recurrent  Learning  (RTRL)  network. 

1.3  Scope 

The  scope  of  this  thesis  is  to  characterize  the  behavior  of  the  subgrouped  RTRL 
network,  as  applied  to  the  problems  examined.  This  includes  its  application  to  the 
prediction  of  the  opening  value  of  the  pound  on  the  London  Exchange,  and  the  vehicle 
identification  problem  based  on  sequences  of  feature  vectors(3)  as  the  image  aspect 
changes  over  time.  The  subgrouped  RTRL  network  is  a  modified  version  of  Capt  Randall 
Lindsey's  thesis  program  (7),  which  is  based  on  the  RTRL  algorithm(17)(20). 
Comparisons  in  performance  of  the  RTRL  and  subgrouped  RTRL  nets  are  also  made,  to 
determine  how  subgrouping  impacts  the  training  time  and  accuracy  to  the  network. 

1.4  Approach 

The  differences  between  the  performance  of  the  RTRL  and  subgrouped  RTRL 
networks  will  be  examined  by  performing  the  several  of  the  demonstration  tasks 
performed  in  Lindsey’s  thesis.  This  will  determine  whether  the  subgrouped  RTRL 
network  has  the  same  functionality  as  Lindsey's  RTRL  code. 

The  network  will  also  be  evaluated  as  a  predictor  and  as  a  classifier.  The  ability 
to  predict  will  be  examined  by  training  the  network  on  historical  data  derived  from  one 
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year’s  worth  of  opening  values  for  the  pound  on  the  London  Exchange,  with  the  desired 
output  being  the  opening  value  of  the  next  day.  After  training,  the  net  will  be  tested  using 
opening  value  data  from  a  different  year. 

The  network's  ability  to  classify  will  be  evaluated  by  applying  the  subgrouped 
RTRL  network  to  the  problem  of  image  identification.  The  network  will  be  trained  to 
differentiate  between  the  images  of  five  different  vehicles,  based  on  sequences  of  vector 
quantized  codewords  which  encode  changes  in  aspect  as  the  viewing  angle  on  the 
vehicles  changes  over  time.  The  4000  sequences  in  the  data  source  file  (800  sequences 
per  vehicle,  five  vehicles)  will  be  placed  in  random  order  and  divided,  using  the  first  90% 
of  the  sequences  for  training  the  network,  and  using  the  other  10%  to  test  the  accuracy  of 
the  network  after  training. 

Chapter  II  provides  background  information  on  neural  networks,  and  on  time 
dependent  neural  networks  in  particular.  It  also  discusses  the  source  of  the  Pound 
monetary  exchange  rate  values  used  to  test  the  net's  ability  to  predict,  and  the 
preprocessing  of  the  data  used  for  the  vehicle  classification  problem.  Chapter  in  delves 
into  the  algorithms  used  by  the  RTRL  and  subgrouped  RTRL  networks,  discusses  several 
operating  parameters  to  the  net  to  that  can  be  changed  to  enhance  performance,  and 
reviews  the  test  methodology  used  to  characterize  the  capabilities  of  the  subgrouped 
RTRL  network.  In  Chapter  IV,  the  testing  results  are  examined,  and  in  Chapter  V 
conclusions  and  recommendations  are  presented. 
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II.  Literature  Review 


2.1  Introduction 

The  purpose  of  this  literature  review  is  to  synopsize  the  current  state  of  time 
dependent  neural  networks,  with  particular  attention  paid  to  recurrent  neural  networks. 

Neural  networks  represent  man's  attempts  to  learn  from  nature's  multi-billion  year 
experiment  with  life,  in  which  the  more  effective  and  advantageous  methods  of  living  in  a 
potentially  hostile  world  are  passed  on  and  improved  through  the  generations  of  living 
things.  Because  of  nature's  head  start  on  us  in  developing  sophisticated  methods  of 
coping  with  the  environment,  we  are  only  now  developing  systems  with  the  capacity  that 
insects  take  for  granted,  i.e.  pattern  recognition,  feature  extraction,  and  autonomous 
travel. 

Recurrent  neural  networks  are  a  subset  of  the  many  varieties  of  neural  nets,  and 
have  the  added  ability  of  incorporating  time  dependency  into  the  evaluation  of  data.  As 
many  phenomena  currently  being  evaluated  with  neural  networks  are  time  varying 
(speech,  visual  processing),  this  property  may  be  essential  to  creating  systems  that  may 
understand  the  spoken  word,  or  interpret  it's  visual  environment. 

This  section  contains  an  overview  of  neural  network  theory,  leading  into  a 
discussion  of  the  various  neural  networks  that  incorporate  changes  over  time  into  their 
training  and  function.  The  focus  will  be  on  Time  Delay  Neural  Networks  (TDNN), 
backpropagation  through  time  (BPTT),  real-time  recurrent  learning  (RTRL),  and  sub¬ 
grouped  RTRL  networks. 


2.2  Background 

Neural  networks  are  algorithms  often  based  on  the  observed  collective  behavior  of 
neurons  in  biological  systems.  In  living  organisms  possessing  a  nervous  system,  neurons 
interconnect  with  each  other  as  well  as  with  sensory  organs  and  muscles.  The  strength  of 
the  signals  transferred  to  a  neuron  depends  on  the  number  of  synaptic  connections  from 
other  neurons,  the  activity  (nerve  depolarizations  per  second)  of  a  stimulating  neuron,  the 
added  stimulation  or  inhibition  provided  by  other  neurons,  and  how  fast  the 
neurotransmitters  being  produced  at  the  synapses  are  broken  down  and  reabsorbed.  All 
these  factors  can  be  considered  as  weighted  inputs  which  influence  whether  the  neuron 
receiving  the  stimulus  will  fire,  and  how  fast  it  will  fire.  This  is  modeled  in  neural 
networks  by  attributing  weights  to  the  interconnections  in  the  network,  and  modifying  the 
value  of  the  weights  in  order  to  train  the  network  to  perform  a  function. 

Network  Outputs 

Layer  2 

2nd  layer  weights 
Layer  1 

1st  layer  weights 
Layer  0 

Bias  Data  inputs 

Figure  1:  A  two  layer  multilayer  perception  backpropagation  network 
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In  die  standard  multilayer  perception  network,  the  "neurons"  are  arranged  in 
layers.  Figure  1  shows  a  two  layer  network  which  will  be  discussed  in  the  following 
paragraphs.  Data  feeds  into  the  lowest  layer,  and  is  often  represented  as  layer  zero  of  the 
network.  In  multilayer  networks,  each  level  below  the  output  layer  provides  inputs  to  the 
next  higher  layer.  Each  neuron  in  the  network  multiplies  each  of  its  inputs  by  a  weight 
associated  with  that  input,  and  sums  the  products  together.  This  sum  for  a  neuron  i 
receiving y  inputs  can  be  described  by: 

n 

S,  =  £  w, ,Xj  +  bj  (1) 

The  weight  wy  is  a  member  of  a  matrix,  with  i  ranging  r  m  one  to  the 
number  of  neurons  in  the  layer  containing  neuron  i,  and  j  ranging  from  v  >  the  number 

of  inputs  from  the  layer  below.  The  input  X,  represents  either  the  outputs  of  the  preceding 
layer  or,  if  x y  lies  on  the  lowest  layer,  the  data  being  fed  into  the  net,  while  b}  is  a  bias 
added  to  the  inputs.  If  the  neuron  i  is  linear,  s,  is  the  output  of  the  neuron.  If  tb~  neuron 
has  a  non-linear  output  function  however,  the  sum  is  fed  into  the  non-linear  function 
(sigmoid,  tanh,  hard  limiter  or  threshold)  to  produce  the  final  output. 

Training  of  the  network  is  accomplished  by  adjusting  the  weights  incrementally  in 
a  way  that  reduces  the  error  between  the  output  of  the  neuron  and  its  desired  output, 
which  for  the  top  layer  of  the  network  is  shown  as 

e,  =d,~y i  (2) 

where  yt  is  the  output  of  neuron  i,  and  d,  is  the  neuron's  desired  output.  If  the  neuron  has 
a  linear  output,  the  error  in  the  output  of  neuron  i  caused  by  weight  Wy  depends  on  input 
Xj  multiplied  by  the  weight  wtJ  .  Changing  the  weights  to  reduce  the  error  can  be 
performed  by  a  simple  formula 

w*'j=w~a-  rfe,Xj  (3) 
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where  tj  is  the  learning  constant  for  updating  the  weights,  and  w  and  w*  refer  to  the 
weight  prior  to  and  after  updating,  respectively. 

If  the  output  of  the  neuron  is  non-linear,  the  weight  update  is  a  little  more 
complicated.  In  the  case  of  a  sigmoidal  output  function,  one  of  the  most  commonly  used 
non-linear  functions,  the  summed  inputs  of  the  neuron  are  processed  by  the  formula 
/(*,)  =  1/(1+^)  (4) 

In  this  case,  the  change  in  the  error  for  that  neuron  for  the  weight  being  updated  (8e/ 5w/y) 
depends  on  the  input  that  was  multiplied  by  the  weight  and  the  derivative  of  the  non¬ 
linear  function.  For  die  sigmoid  function,  the  derivative  is 


MX  i-M)) 

(5) 

leading  to  a  weight  update  formula  of 

w*tJ  =  w~ij  -  riej(st  XI  -f(s,  ))Xj 

(6) 

If  the  network  has  a  layer  of  neurons  below  the  output  layer  (usually  called  a  hidden 
layer),  there  is  no  set  desired  output  for  these  neurons  to  train  on.  Instead,  the  error 
generated  by  these  neurons  must  be  inferred  by  their  net  effect  on  the  error  of  the  output 
layer  neurons.  This  carrying  of  the  errors  produced  at  the  output  of  the  network  back  to 
the  hidden  layers  is  the  origin  of  the  term  backpropagation.  For  a  neuron  j  in  the  hidden 
layer  this  error  depends  on  the  weights  between  neuron  j  and  the  output  neurons,  and  on 
the  derivative  of  the  sigmoidal  function  used  by  the  output  neurons.  This  can  be 
summarized  by 

-M))  (7) 

I  -1 

Using  this  term  for  the  change  in  the  output  error  generated  by  the  output  of  neuron  j, 
and  with  the  same  dependencies  on  the  sigmoidal  function  and  the  inputs  into  neuron  j  for 
updating  the  weights  as  was  seen  in  the  output  level,  we  can  update  the  hidden  layer 
weights  using 
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(8) 


w+jt  =  w'ji  -  r)f(Sj  XI  -f(Sj  ))x,  e,f(st  XI  -/(*, )) 

/-i 

where  Wji  represents  the  weight  matrix  used  to  weight  the  inputs  from  the  next  lower 
level.  For  a  network  with  only  two  layers  of  neurons,  x,  is  one  of  the  data  inputs  being 
fed  into  the  network. 

By  updating  the  weights  in  the  network  incrementally  over  multiple  (often 
thousands)  of  epochs  in  which  the  input  data  is  passed  through  the  network  each  epoch, 
the  weights  eventually  reach  a  point  at  which  the  error  has  reached  a  minimum.  This 
minimum  may  be  the  lowest  possible  error  that  can  be  achieved,  or  it  may  be  a  "local 
minimum"  in  which  the  net  has  become  caught  Because  changing  the  weights 
incrementally  to  travel  between  a  local  minimum  and  the  global  minimum  would  raise 
the  output  error  temporarily  in  the  process,  the  learning  algorithm  described  above  may 
not  be  able  to  reach  die  lowest  possible  output  error. 

The  preceding  paragraphs  provide  a  top  level,  non-mathematically  intensive 
description  of  how  a  standard  backpropagation  neural  net  operates.  For  the  interested 
mathematically  inclined  reader,  many  excellent  texts  provide  a  detailed  derivation  of  the 
backpropagation  algorithm  (12). 

2. 3  Scope  of  Literature  Review 

Because  this  thesis  is  based  on  the  use  of  a  time  dependent  neural  network,  specifically 
the  subgrouped  RTRL  algorithm,  this  review  will  focus  on  those  types  of  neural  networks 
that  are  designed  to  incorporate  time  as  a  dimension  in  the  training  and  function  of  the 
network.  There  are  many  forms  of  networks  that  use  time  in  some  manner,  with 
variations  and  entirely  new  architectures  being  introduced  regularly.  Therefore,  the  broad 
classes  of  the  currently  well  known  time  based  neural  networks  will  be  discussed.  A  brief 
description  of  the  derivation  of  image  features  used  for  the  vehicle  classification  problem 
is  also  provided. 
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2.4  Time  Delay  Neural  Networks 

Hie  element  of  time  can  be  incorporated  into  the  training  of  neural  networks  in 
several  ways.  The  inputs  into  the  network  may  include  more  than  one  "frame"  of  the 
training  data,  which  is  shifted  through  as  the  net  is  trained  (Figure  2).  The  Time  Delay 
Neural  Network  (TDNN)  (Figure  3)  operates  on  the  same  principle,  where  each  input  is 
split  N  tunes,  with  each  of  the  N  branches  delayed  by  a  different  increment  in  time.  This 
widens  the  window  that  the  net  "sees"  of  the  data,  to  incorporate  N  time  slices  of  die  data 
stream.  Waibel(18X19)  has  used  this  type  of  network  with  some  success  for  the 
identification  of  phonemes  in  Japanese. 


Figure  2:  Input  data  Is  shifted  along  inputs  to  the  net  In  this  example,  the  net 
seas  three  time  samples  of  two  inputs. 
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Flgm  1:  BmI*  hmnom  In  Ttan*  Datay  Neural  IMwortL  Kadi  Input  la  spflIN+i  tfcnaa,  wMi  aach 

2.5  Recurrent  Network  Variations 

Recunency  in  a  neural  network  basically  involves  the  feeding  back  of  the  outputs 
of  neurons  in  die  network  to  other  neurons  on  die  same  layer  or  at  lower  levels.  Jordan(6) 
proposed  a  network  that  operated  like  a  standard  two  layer  backpropagation  network,  but 
fed  back  the  outputs  of  the  network  as  inputs,  allowing  the  net  to  "see"  what  was 
produced  during  die  last  iteration  (Figure  4).  The  recurrent  output  values  were  fed  into 
the  hidden  layer  neurons,  as  well  as  having  each  state  unit  neuron  feeding  back  to  itself, 
multiplied  by  an  attenuation  factor.  Elman(2)  described  a  variation  on  this  concept,  in 
which  the  output  of  the  hidden  nodes  is  fed  back  as  net  inputs  (Figure  5).  These  recurrent 
architectures  are  straightforward,  in  that  no  changes  to  the  standard  backpropagation 
algorithm  is  required.  The  recurrent  values  are  treated  as  inputs,  and  the  net  performs  a 
gradient  descent  to  minimize  the  error  as  it  trains. 
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Rumelhart(15)  proposed  a  different  tack  in  approaching  the  treatment  of  time.  In 
his  recurrent  network,  the  network  is  treated  as  a  feedforward  network  that  grows  one 
layer  with  each  iteration.  This  algorithm  is  known  as  back-propagation  through  time 
(BPTT),  and  while  it  does  solve  time-dependent  problems  it  suffers  from  computer 
resource  limitations,  as  the  net  grows  larger  with  larger  input  sequences.  Rohwer  and 
Forest(13)  modified  this  approach  by  creating  multiple  copies  of  the  starting  network, 
with  each  copy  representing  a  time  step  in  the  training  sequence. 

Pineda(10)  generalized  Rumelhart's(15)  learning  rule,  while  eliminating  the 
requirement  to  unfold  or  duplicate  the  network  for  each  time  step.  The  net,  similar  in 
some  ways  to  the  Hopfield  network,  is  designed  to  adjust  the  weights  in  order  to  produce 
a  fixed  point  (corresponding  to  a  memory  in  a  Hopfield  net)  when  an  input  x,  is  presented 
to  Ae  net  in  an  initial  state  xf.  Unlike  the  Hopfield  net,  the  weights  are  adjusted  to 
minimize  the  error  of  the  system  during  training. 

Pineda  (10)  later  stated  that  gradient  descent  cannot  create  new  fixed  points,  only 
move  existing  ones.  To  create  new  fixed  points  requires  "teacher  forcing",  which 
constrains  the  degrees  of  freedom  in  the  network  during  training,  and  releases  them 
during  recall.  He  also  states  that  there  is  no  guarantee  that  after  the  clamped  degrees  of 
freedom  are  released  that  the  system  will  be  stable  on  those  fixed  points,  and  that  the 
fixed  points  generated  by  the  clamping  may  become  "repellers"  rather  than  "attractors." 

Pineda's(10)  algorithm  for  training  recurrent  networks  was  adapted  and 
generalized  by  Pearlmutter(9)  to  minimize  the  net  error  as  a  function  of  the  temporal 
trajectory  of  the  states  of  the  network.  This  new  algorithm  trained  slowly  and 
occasionally  became  unstable,  and  was  modified  by  Fang  and  Sejnowski(4)  to  overcome 
these  obstacles. 


2. 6  Real-Time  Recurrent  Learning  (RTRL) 

Another  variation  in  die  recurrent  network  taxonomy  is  the  real-time  recurrent 
learning  (RTRL)  network  proposed  by  Williams  and  Zipser(17)  (Figure  6).  It  also 
minimizes  die  net  error  along  a  temporal  trajectory  using  gradient  descent,  and  can  be 
used  to  recognize  temporal  sequences.  Unlike  the  BPTT  algorithm  and  many  of  its 
derivatives  however,  the  network  does  not  grow  over  long  training  sequences.  The  RTRL 
network  does  suffer  from  large  memory  and  processing  requirements,  as  the  algorithm 
requires  0(n4)  computations  per  time  step  for  n  neurons.  Because  of  this,  this  algorithm 
can  be  unsuitable  for  any  problem  that  requires  a  combination  of  multiple  (>10)  inputs, 
multiple  (>3)  outputs  and  associated  hidden  units. 

Because  the  net  processes  information  by  passing  the  output  of  the  neurons  back 
as  inputs  at  the  next  point  in  time,  the  training  output  values  provided  to  the  net  must  be 
delayed  one  or  more  time  steps  as  compared  to  the  corresponding  network  training  inputs. 


input  data  recurrent  output  values 


Figure  6:  Basic  RTRL  architecture,  with  two  outputs,  two  hkfdsn  nodss, 
and  two  inputs. 
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Sabgroap  1  Sabgroap  2 


input  data  recurrent  outpat  values 


Figure  7:  The  subgrouped  RTRL  architecture,  as  implemented  for  this  thesis.  Each  output  is 
paired  with  one  or  more  hidden  nodes.  Note  the  connectivity  between  the  nodes  is  the  same 
as  in  the  basic  RTRL  architecture. 

2. 7  Subgrouped  RTRL 

To  address  the  exponential  growth  in  computational  requirements  of  RTRL, 
Zipser(20)  proposed  a  method  of  breaking  the  updating  of  the  weights  into  subgroups 
(Figure  7).  This  method  can  reduce  the  computational  complexity  of  the  algorithm  from 
0( wn2)  to  0(w),  where  w  represents  the  size  of  the  weight  matrix  and  n  equals  the  number 
of  neurons.  The  connectivity  within  the  network  is  unchanged,  but  the  updating  of  each 
weight  depends  on  only  a  subset  of  the  error  generated  by  the  recurrent  neuronal  outputs. 
For  g  subgroups  the  weight  updating  algorithm  is  g2  times  fester,  although  each  subgroup 
now  has  less  of  the  temporal  "memory"  than  was  found  in  the  original  algorithm. 
Zipser(20)  states  that  this  can  be  compensated  for  by  using  more  hidden  units,  while  still 
operating  at  a  much  fester  processing  rate. 

Like  the  RTRL  algorithm,  network  training  outputs  must  be  delayed  by  one  or 
more  time  steps  from  the  corresponding  network  inputs.  The  explanation  of  the  RTRL 
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algorithms,  and  how  subgrouping  speeds  19  die  network,  is  discussed  in  section  3.2.  It  is 
die  subgrouped  RTRL  algorithm  that  will  be  the  focus  of  this  thesis. 

2.8  London  Exchange  Opening  Quotes 

There  are  many  examples  in  this  world  of  data  whose  changes  over  time  appear  on 
the  surface  to  be  random  or  chaotic,  but  are  dependent  on  some  underlying  mechanism 
that  drives  (or  influences)  the  path  the  data  takes.  One  example  of  this  would  be  the 
amplitude  of  a  vocal  signal,  dictated  by  the  mechanics  of  the  vocal  chords  and  the 
phoneme  being  uttered  at  the  moment.  Another  possible  example  would  be  the 
movement  of  a  pilot's  head  in  XYZ  space  during  a  mission,  which  would  be  influenced 
by  the  voluntary  movements  (looking  at  the  Heads  Up  Device)  and  the  inertial  forces 
generated  by  aircraft  maneuvering.  The  ability  to  predict  the  path  of  a  signal  based  on 
past  behavior  could  be  very  beneficial,  and  would  depend  on  the  predictor's  ability  to 
internalize  and  emulate  the  mechanisms  or  forces  that  drive  the  signal  to  change.  For  this 
thesis,  the  opening  quotes  for  the  value  of  the  pound  on  the  London  Exchange  are  used  to 
study  the  subgrouped  RTRL  net's  ability  to  perform  this  function. 

2. 9  Vector  Quantized  Image  Sequences 

The  ability  to  visually  recognize  objects  is  one  that  we  take  for  granted,  unless  we 
try  to  duplicate  this  ability  in  a  machine.  Generally,  this  is  performed  (or  attempted)  by 
extracting  key  visual  features  that  are  characteristic  for  the  object  being  identified.  The 
data  used  for  this  thesis  was  derived  from  CAD  generated  3-D  images  of  an  M-60  tank, 
an  M35  truck,  a  BTR60  armored  personnel  carrier,  a  T62  tank,  and  an  M2  infantry 
fighting  vehicle.  Each  image  was  viewed  from  multiple  (592)  different  angles  around  and 
above  the  vehicle  representations,  and  the  data  was  processed  and  vector  quantized(3)  to 
produce  64  codewords.  Each  codeword  (0  -  63)  represents  the  visual  information  of  areas 
of  similar  aspect  or  characteristic  view.  Codewords  may  be  associated  with  one  or  more 
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of  the  vehicles;  the  key  information  is  contained  in  the  sequences  of  codewords, 
representing  a  changing  image  aspect  over  time. 

2.10  Summary 

Recurrent  neural  networks  have  grown  in  complexity  from  a  basic  feeding  back  of 
the  output  error  of  neurons  at  higher  levels(2X6)  to  algorithms  that  specifically 
incorporate  the  function  of  time  into  the  weight  updates.  Because  of  this,  these 
algorithms  are  uniquely  capable  of  following  the  "trajectory"  of  the  data  through  the  time 
steps,  allowing  the  network  to  predict  what  can  be  expected  to  occur  next  and  respond 
accordingly.  This  added  dimension  of  time  expands  the  observed  behavior  of  neural  nets 
in  generating  a  probability  function  as  an  output,  in  that  the  preceding  time  steps  add  to 
the  network's  ability  to  generate  the  most  likely  output. 

The  subgrouped  RTRL  algorithm  is  a  flexible,  time-dependent  method  of 
predicting  what  the  most  likely  output  should  be,  given  the  current  inputs  fed  into  the  net 
at  this  time  and  the  inputs  that  were  fed  into  the  net  in  the  past.  As  such,  its  abilities  and 
limitations  need  to  be  evaluated  and  explored.  The  full  description  of  this  algorithm,  and 
the  tests  performed  in  this  thesis  to  evaluate  its  effectiveness,  are  documented  in  Chapter 

m. 
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///.  Methodology 


3.1  Introduction 

Chapter  n  provided  an  overview  of  how  the  standard  multilayer  network  with 
backpropagation  learning  operates,  as  well  as  a  review  of  the  more  prominent  networks 
that  are  designed  for  the  processing  of  temporally-dependent  data.  The  real-time 
recurrent  learning  (RTRL)  neural  network  and  the  subgrouped  RTRL  were  highlighted 
due  to  their  importance  to  this  thesis.  This  thesis  involves  the  modification  of  the  RTRL 
C  code  written  by  Capt  Lindsey(7)  into  the  subgrouped  RTRL  algorithm,  and  the 
enhancement  of  the  performance  and  learning  effectiveness  through  several 
modifications.  The  utility  of  the  algorithm  is  demonstrated  via  its  application  to  the 
prediction  of  the  value  of  the  English  pound  based  on  the  opening  values  at  the  London 
Exchange,  and  the  identification  of  vehicle  images  based  on  image  features  as  the 
viewing  aspect  changes  with  time. 

This  chapter  covers  the  development,  modifications  and  testing  of  the  subgrouped 
RTRL  program.  The  algorithm  for  the  subgrouped  RTRL,  and  how  it  differs  from  the 
basic  RTRL  algorithm,  is  described  and  discussed.  Also,  the  training  and  testing 
methodology  is  reviewed. 

3. 2  Subgrouped  RTRL  Algorithm 

Like  the  basic  RTRL  algorithm,  the  subgrouped  RTRL  is  an  error  gradient 
following  algorithm  for  a  completely  recurrent  network.  The  subgrouped  RTRL 
algorithm  is  structurally  the  same  as  the  basic  RTRL  algorithm;  the  same  connectivity 
exists  between  the  nodes  as  in  the  RTRL.  The  main  difference  lies  in  the  extent  to  which 
each  node  influences  the  weight  updates  of  the  network. 

In  the  implementation  of  the  subgrouped  RTRL  for  this  thesis,  some  restrictions 
into  the  algorithm  have  been  incorporated  to  simplify  the  design.  Both  Lindsey's(7) 
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original  RTRL  code  and  the  modified  subgrouped  RTRL  allow  the  user  to  specify  the 
number  of  output  nodes,  input  nodes  and  hidden  nodes.  The  user's  selection  ot  the 
number  of  hidden  nodes  however  is  changed,  if  required,  to  make  the  number  of  hidden 
nodes  an  integer  multiple  of  the  number  of  output  nodes  selected.  Each  output  node  is 
then  grouped  with  an  equal  number  of  hidden  nodes.  As  in  the  basic  RTRL  algorithm, 
the  linear  and/or  sigmoidal  outputs  of  the  output  and  hidden  nodes  are  fed  back  into  base 
of  the  net,  comprising  part  of  the  input  for  the  next  iteration. 

The  subgrouped  RTRL  algorithm  was  proposed  by  Zipser(20)  in  response  to 
observations  that  the  RTRL  algorithm  required  a  great  deal  of  computation  to  train.  This 
is  due  to  the  0(n4)  complexity  of  each  time  step,  with  n  representing  the  total  number  of 
output  and  hidden  nodes.  This  thesis  will  review  the  subgrouped  RTRL  algorithm,  and 
demonstrate  where  it  deviates  from  the  basic  RTRL  algorithm.  Terms  used  in  this 
derivation  will  be  consistent  with  those  used  in  Zipser*s(20)  article  and  Lindsey's(7) 
thesis.  Specific  portions  of  the  discussion  are  attributed  to  the  subroutines  in  the  C  code 
in  Appendix  B,  to  help  the  reader  associate  the  algorithm  to  its  implementation. 


Basic  terms: 


The  network  (Figures  6  &  7)  consists  of  n  neural  node  units  and  has  m  external 

inputs  (the  first  of  which  is  a  bias  of  1 ).  At  time  t, 

the  output  of  the  kth  neuron  is  represented  by  yjt),  where  k  ranges  from  0  to  n  -1. 

the  summed  activation  value  of  neuron  k  at  time  t  is  s^t) 

the /th  external  input  into  the  net  is  x/t),  where  j  ranges  from  0  to  m  - 1 . 

the  m  +  n  net  inputs  comprise  the  input  vector  as  time  /,  zft),  where  /'  ranges  from 
0  to  m  +  n  - 1.  This  is  shown  by: 


i/7  el 
i/7eU 


(9) 


where  U  identifies  the  subset  of  the  j  indices  in  Zj  derived  from  the  n 
network  outputs  of  the  prior  iteration,  while  /  identifies  the  subset  of  /' 
indices  in  zy  in  which  the  /th  member  is  one  of  the  m  external  inputs. 


the  error  measured  at  neuron  k  is  represented  by  eft). 

the  Kronecker  delta  function,  S&  equals  1  if  i  =  k,  and  is  0  otherwise. 

the  non-linear  sigmoidal  function  at  neuron  k  is  shown  as  fk 

the  p\  matrix  represents  8yi(/+l)/8wtf,  where  in  the  original  RTRL  network, 

i  ranges  from  0  to  n  -1  ,j  ranges  from  0  to  m  +  n  -1,  and  k  ranges  from  0  to  n-1. 
In  the  subgrouped  RTRL  net,  i  ranges  from  0  to  g  - 1,  where  g  is  the  size  of  the 
net  subgroups,  j  ranges  from  0  to  m  +  n  -  1,  and  k  ranges  from  0  to  n  -  1. 

As  was  covered  in  the  discussion  of  the  basic  backprop  net  (Chapter  II)  there  is  a 
weight  matrix  Wy,  where  i  is  the  index  of  one  of  the  n  neurons,  and  j  refers  to  one  of  m+n 
inputs. 


Subroutine  Compute  Error: 

This  routine  calculates  the  error  at  the  net  outputs,  based  on  the  net's  prediction  of 
what  die  output  should  be,  which  was  calculated  during  the  prior  iteration.  The  error  is 
found  by  taking  the  difference  between  the  linear  or  sigmoidal  output  of  those  nodes 
designated  as  "output"  nodes,  and  the  desired  output  of  the  network.  The  error  at  each 
output  node  k  is  defined  as 


=  - if  keT 
*'■ '  u)  otherwise 


(10) 


where  T  represents  the  subset  of  neurons  that  produce  the  net  outputs. 

In  the  original  RTRL  code  (Figure  6),  the  first  k  nodes  were  output  nodes,  while 
the  remaining  n  -  k  nodes  were  the  hidden  nodes.  For  this  implementation  of  the 
subgrouped  RTRL  (Figure  7),  the  output  nodes  are  every  rth  neuron,  where  i  -  (n  / 
number  of  net  outputs). 

The  total  mean  squared  network  error  is  then  calculated  as 


•4* *  (»)-£<l/2)2>,(<)f 

I  keiJ 
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Subroutine  ResetDelwS: 

This  subroutine  multiplies  the  delta  weight  matrix  with  a  momentum  term  after 
each  iteration  of  data  is  processed  by  the  net,  allowing  the  net  to  use  momentum  while 
training.  The  rationale  for  the  addition  of  the  momentum  term  is  discussed  in  section 
3.3.2. 

Subroutine  Propagate: 

The  y^t-l)  outputs  of  the  n  neurons  that  were  computed  during  the  last  iteration 
are  incorporated  in  the  input  vector  z(t).  The  nodal  activation,  or  the  weighted  sum  of  the 
inputs  for  each  node  (s*),  is  calculated  as 

%(0  =  jy*  */  (0  (12) 

/  «{/u/ 

In  other  words,  each  neuron  sums  all  of  its  inputs  multiplied  by  their  respective 
weights  to  form  the  activation  value  for  that  neuron  at  time  t. 

Subroutine  Compute  Output: 

The  output  for  the  following  iteration  is  calculated  next.  This  is  expressed  by 
yk  (/  +  1)=/*  (**('))  (13) 

where  fk  is  the  sigmoidal  function  for  the  hidden  nodes,  and  can  be  sigmoidal  or  linear  for 
the  output  nodes.  This  y^t+l)  term  is  the  network's  predicted  value  of  what  the  desired 
value  will  be  next  iteration. 

Subroutine  Update: 

The  updating  of  the  weights  in  this  algorithm  is  the  most  complex  and 
computationally  intensive  portion  of  the  RTRL  algorithm.  It  was  due  to  the 
computational  requirements  of  this  function  that  the  subgrouped  RTRL  algorithm  was 
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proposed  by  Zipser(20),  and  utilised  for  mis  diesis.  To  understand  this,  we  mint  look  at 
the  effects  of  using  recurrence  in  the  network. 


Figure  8:  The  output  of  neuron  y2  at  time  f+1  is  dependent  on  the  highlighted 
weight  connections  and  their  inputs. 


yl(t+2)  y2(H-2)  y3(t+2)  y4(t+2) 


B to  il(tfl)  x2(t+-I)  yl(t+l)  y2<t+l)  y3(t+l)  y4(H-l) 


Figure  9:  At  the  next  iteration,  the  output  of  neuron  y2  has  been  fed  back  into  the 
network  as  an  input  thereby  affecting  the  output  (and  error)  of  each  of  the 
neurons  at  f+2.  Note  that  neuron  y2  can  affect  its  own  output  (dashed  line)  during 
the  next  iteration. 

In  order  to  calculate  the  update  to  the  weight  Wy  for  the  next  iteration  using  the 
RTRL  algorithm,  we  must  look  at  the  error  in  the  net  caused  by  that  weight  Weight  Wy 
affects  the  output  of  neuron  i  at  time  t  (Figure  8).  Since  the  output  of  the  neuron  i  is  fed 
back  into  the  net  along  with  whatever  error  it  may  contain,  wy  impacts  the  error  in  the 
next  iteration  of  all  the  neurons  (Figure  9). 


The  relationship  between  the  energy  level  in  the  network  and  the  network  weights 
is  represented  by: 


owu  nr,  s  wu 


(14) 


T(f  kwU  v 

Because  this  is  a  recurrent  network,  a  change  in  wy  at  time  t  affects  the  output  and  error  of 
neuron  it  at  time  t  +1.  For  the  RTRL  network,  this  is  expressed  by 


ST3-  =  /*’«[£  «■„  t/,  *  st  z,  w]  (1 5) 

where  k  e  U,  /€ U,  and  j e  U ul.  The  term  p\  represents  the  effect  that  a  change  in 
weight  wv  would  have  on  the  output  of  neuron  l  at  a  following  iteration.  Since  neuron  /  is 
then  fed  back  into  the  network  and  becomes  an  input  to  neuron  k  at  time  t  +  1,  the 

term  is  a  summation  of  each  of  the  weights  associated  with  the  recurrent  inputs 

i 

to  neuron  it,  times  the  changes  in  the  recurrent  inputs  that  were  caused  by  weight  vty.  In 
other  words,  this  term  sums  the  indirect  effect  that  weight  vfy  has  on  the  output  of  neuron 
k  from  changing  the  output  of  neuron  1 

If  neuron  i  and  neuron  it  are  die  same  (separated  by  time),  the  effect  of  a  change  in 
weight  wv  on  the  output  of  neuron  k  can  be  expressed  by  the  added  term 

<W0  06) 

The  SA  term  is  the  Kronecker  delta  function,  which  equals  one  if  i  -  it,  and  equals  zero 
otherwise;  z/t)  represents  the  /th  input  to  neuron  i.  The  need  for  this  term  can  be 
explained  as  follows:  At  time  t,  neuron  i  receives  die  value  of  input  Zj(t)  multiplied  by 
weighty  At  t+\,  neuron  k  receives  the  output  of  neuron  i  as  an  input.  Note  that  in  this 
case  there  is  no  intermediate  neuron  /  for  weight  wy  to  influence  neuron  k  through,  hence 
no  p\  term.  If  neuron  i  and  neuron  it  are  the  same,  but  at  different  time  steps,  weight 
change  Awy  affects  neuron  ks  output  indirectly  through  changing  its  output  directly 
during  the  previous  time  step. 
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The  change  in  the  output  of  neuron  k  in  respect  to  a  change  in  the  weight  wg  can 
be  represented  p£(/+l),  by  the  equation 

p‘,o+l)=Sy,gl+t)  m 

Thus,  equation  (17)  can  be  rewritten 

P,‘('+1)  -//(')[2>*ri+4.  */«]  (IS) 

laV 

The  ptg  tom  is  implemented  in  the  C  code  asanxni  (m+n)  matrix  (p  matrix),  and  is 
used  to  update  the  n  x  (m+n)  weight  matrix.  This  is  the  direct  cause  of  the  Oiwn2) 
complexity  of  this  algorithm,  and  die  reason  why  RTRL  too  slow  to  train  for  other  than 
small  problems,  limited  in  the  number  of  outputs  or  of  hidden  nodes.  Each  weight  is 
updated  based  on  its  effect  on  all  of  the  neurons  in  the  net  To  avoid  this,  the  net  can  be 
subgrouped  in  such  a  way  that  when  weight  is  updated,  it  is  only  based  on  its  effect  on 
die  neurons  within  its  group. 

In  die  subgrouped  RTRL  implementation  used  for  this  thesis,  the  number  of 
groups  in  die  net  is  equal  to  the  number  of  network  outputs.  Because  of  this,  the  p  matrix 
(p*tf)  becomes  ansxnx  (m+n)  matrix,  where  s  equals  the  number  of  nodes  in  each  of  the 
subgroups  in  die  net  Tire  size  of  the  p  matrix  has  been  reduced  by  a  factor  of  g,  which  is 
the  number  of  groups  in  the  net 

In  the  subgrouped  RTRL  algorithm,  equation  (1 8)  becomes 

p?('+i)-/'(o[2>*pi+$,z/(o]  09) 

l*U, 

where  keUg,  leUg,  ieUg,andj  e/ulL  Ug  is  that  subset  of  the  recurrent  neuronal 

outputs  that  belongs  the  group  containing  neuron  /.  In  other  words,  the  />'  term 

i,u, 

represents  the  summation  of  the  recurrent  (neuron  /)  inputs  to  neuron  k,  where  neuron  /  is 
from  within  neuron  k's  subgroup,  times  die  change  in  neuron  l’s  output  caused  by  changes 
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to  weight  Wy  during  a  previous  iteration.  The  effect  that  weight  Wy  has  on  neurons  and 
weights  outside  the  subgroup  are  not  calculated.  The  consequence  of  this  change  is  that 
the  p  matrix  is  smaller,  the  net  runs  approximately  g2  times  faster.  Zipser(20) 
hypothesized  that  the  subgrouping  of  the  network  may  impact  net  accuracy,  but  believed 
this  can  be  compensated  for  with  the  addition  of  extra  hidden  nodes.  He  also  stated  that 
the  time  delay  caused  by  the  additional  nodes  should  be  more  than  compensated  for  by 
the  speedup  caused  by  the  subgrouping. 

All  of  this  theory  being  said,  the  Update  subroutine  begins  by  first  calculating  die 
delta  weights  (  A wy)  based  on  the  p  matrix  calculated  during  the  last  iteration.  In  the 
RTRL  algorithm,  this  weight  update  is  derived  from  equation  16  as 

=  a£e(k)p,j  (20) 

k 

where  i  e  U,  j  e  U<jI,  k  e  U,  and  a  is  the  learning  constant.  In  the  subgrouped  RTRL 
algorithm,  this  becomes 

AwtJ  =  ae(k)p*  (21) 

where  i  e  Ug,  j  e  Uul,  and*  €  Ug  .  Thus  only  the  error  at  each  group's  output  node 
drives  the  weight  updates  for  the  weights  associated  with  that  group. 

Next,  the  subroutine  calculates  a  new  p  matrix  based  on  the  above  algorithm,  and 
saves  die  new  p  matrix  for  the  next  weight  update. 

3.3  Network  Parameters 

The  RTRL  algorithm,  as  implemented  by  Capt.  Randall  L.  Lindsey(7),  was  able  to 
perform  several  time  dependent  tasks  quite  well.  These  tasks,  however,  required  only  one 
or  two  outputs.  When  research  on  this  thesis  was  begun,  it  was  quickly  determined  that 
the  algorithm  as  outlined  in  Lindsey's  thesis  was  not  appropriate  for  some  of  the  larger, 
more  complex  tasks.  Processing  time  required  for  training  on  phoneme  broad  classes  for 
more  than  one  voice  was  measured  in  days.  The  outputs  of  the  network  would  tend  to 
lock  onto  zero  or  one,  even  if  the  output  was  in  error.  To  avoid  this  problem,  research 
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into  w ne«ns  of  improving  the  training  time  and  accuracy  of  die  network  was  initiated. 
This  led  to  die  exploration  of  the  subgrouped  network  as  proposed  by  Zipser(20),  as  well 
as  several  other  methods  of  manipulating  network  performance. 

The  following  is  a  discussion  of  the  network  parameters  or  algorithms  added  to 
modify  the  learning  behavior  of  the  implementation  of  the  subgrouped  RTRL  network 
used  for  this  thesis,  in  an  attempt  to  improve  network  learning  speed  and  accuracy. 
Evaluation  of  the  initial  learning  rate,  momentum,  minimum  sigmoidal  derivative,  teacher 
forced  learning,  and  weight  update  skipping  error  threshold  parameters  were  performed 
using  a  Payton  algorithm  (8)  processed  TTMTT  voice  file.  The  voice  file  selected  has  389 
data  points  of  20  inputs  and  6  outputs,  each  output  corresponding  to  one  of  six  broad 
classes  of  phonemes.  Each  training  run  using  these  parameters  was  performed  on  ten 
networks  with  different  initial  random  weights,  and  the  results  of  the  training  runs  were 
averaged  for  a  composite  graph  of  the  network  output  accuracy.  The  graphs  showing  the 
composite  accuracy  for  the  above  parameters  are  shown  in  Chapter  IV. 

One  network  parameter  was  evaluated  without  using  the  voice  file  data.  It  allows 
the  net  to  treat  the  training  data  as  continuous  between  the  end  of  one  epoch  and  the  start 
of  the  next,  and  is  discussed  in  section  3.3.6.  This  capability  was  added  to  address  a 
byproduct  of  the  way  in  which  the  RTRL  algorithm  learns,  and  so  is  discussed  using  the 
type  training  problem  in  which  this  byproduct  can  be  observed. 

3.3. 1  Variable  Learning  Rate 

The  subgrouped  RTRL  network  used  for  this  thesis  reduces  the  learning  rate 
(alpha)  by  a  factor  of  ten  whenever  the  network  error  rises  more  than  1%  over  the 
minimum  error  reported  to  that  point,  or  if  the  difference  in  error  between  the  current 
epoch  and  the  previous  epoch  is  less  than  0.0000001 .  This  is  done  to  prevent  the  network 
from  becoming  unstable  if  the  learning  rate  is  too  high,  and  to  improve  the  network  error 
minimization  when  the  net  error  has  reached  a  plateau. 
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Setting  the  learning  rate  at  a  high  or  low  level  at  the  beginning  of  training  has  a 
definite  impact  on  the  network's  ability  to  learn  a  task  over  the  training  period.  Set  the 
rate  too  high,  and  it  immediately  adapts  to  the  inputs  at  time  t,  forgetting  previously 
learned  behavior  and  therefore  losing  it's  ability  to  generalize.  Start  with  too  low  an 
initial  learning  rate,  and  the  net  learns  slowly  and  may  become  stuck  in  a  local  minimum. 

To  observe  the  effect  of  the  initial  learning  rate  on  network  training,  the  net  was 
trained  using  the  Payton  processed  voice  file  for  200  epochs,  with  initial  learning  rates  of 
0.1,  0.01  and  0.001.  The  network  configuration  was  the  20  neural  activity  level  inputs 
produced  by  the  Payton  algorithm,  6  sigmoidal  outputs,  and  12  sigmoidal  hidden  nodes. 
The  training  output  was  delayed  two  time  steps.  For  this  problem,  the  best  learning  results 
were  obtained  using  an  initial  rate  of  0.01.  The  differences  in  network  performance 
caused  by  different  initial  learning  rates  are  discussed  in  Chapter  IV. 

3.3.2  Momentum 

The  use  of  momentum  to  speed  up  the  learning  of  a  backpropagation  network  is 
well  established  (14).  Use  of  momentum  tends  to  dampen  out  the  oscillations  in  network 
accuracy  during  learning,  and  carry  the  net  down  the  averaged  out  path  to  an  error 
minimum.  To  add  momentum  to  the  network,  the  delta  weights  are  simply  multiplied  by 
the  momentum  factor  after  the  weights  are  updated.  This  is  summed  with  the  next 
calculated  set  of  delta  weights,  to  allow  the  carry  over  a  portion  of  the  weight  update  ffom 
time  /-I .  The  momentum  factor  is  a  parameter  read  by  the  network  during  initiation. 

The  impact  of  using  momentum  was  measured  by  training  the  subgrouped  RTRL 
net  using  momentum  set  at  0, 0.S  and  0.9,  with  a  network  configuration  of  20  inputs,  six 
sigmoidal  outputs  and  12  sigmoidal  hidden  nodes.  The  network  with  a  momentum  of  0.9 
demonstrated  the  highest  accuracy  and  lowest  error  during  training,  followed  by  the  net 
with  0.5  momentum  factor.  This  indicates  that  momentum  does  improve  training 
performance  for  this  problem.  Further  discussion  of  this  test  is  provided  in  Chapter  IV. 
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3.3.3  Minimum  value  for  output  derivative  factor 

For  an  output  y (f),  the  derivative  of  the  sigmoid  transfer  function  is  y[/](l-y[/]). 
One  common  problem  encountered  Alien  using  neural  networks  for  categorization  of 
inputs  is  that  the  derivative  of  the  sigmoid  output  function  tends  towards  zero  when  the 
output  approaches  zero  or  one. 

CyWXi-yW)  =  OXi-D  =  (OXi-O)  =  o  (22) 

Even  if  the  output  is  wrong,  the  error  feedback  used  to  update  the  weights  is  zero 
or  very  small.  This  can  cause  an  output  to  "hang"  or  latch  on  a  wrong  value,  slowing 
down  learning  tremendously.  Van  Ooyen  and  Nienhuis  (16)  proposed  the  use  of  a 
different  energy  equation, 

(23) 

where  tj  represent  the  desired  nodal  output,  and  z>  represents  the  actual  output  of  node  j. 
When  this  function  is  used,  the  partial  derivative  of  the  error  function  contains  the  inverse 
term  to  the  sigmoid  function  derivative,  canceling  it  out.  Thus  during  weight  updates  the 
error  at  the  output  is  fed  back  directly  without  the  sigmoid  derivative  term,  avoiding  the 
latching  of  the  neuron  in  the  wrong  state  during  training.  Rather  than  redefining  the  error 
function  for  the  subgrouped  RTRL  network  however,  a  similar  effect  was  gained  by 
setting  a  minimum  value  for  the  sigmoid  derivative  of  the  output  neurons.  When  the 
derivative  falls  below  the  minimum  set  value,  the  set  value  is  used  for  the  updating  the  p 
matrix  (equation  24).  Above  the  set  value,  the  sigmoid  derivative  value  is  used. 

Pu  (t  + 1)  =  /'  J  Z Wu  p\j  +  3*  (')]  (24) 

l,U, 

This  one  change  appears  to  have  caused  the  biggest  improvement  is  learning 
effectiveness,  compared  to  the  other  variables  used  to  manipulate  the  network. 
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3.3.4  Teacher  forced  learning 

Williams  and  Zipser(17)  stated  that  learning  could  be  accelerated  if  teacher  forced 
learning  is  used.  Teacher  forced  learning  involves  replacing  network  output  values  from 
time  t  with  the  desired  values  (after  computing  the  error),  which  are  then  used  as  the 
recurrent  inputs  at  time  /  +  1 .  This  helps  to  train  the  network  faster  for  some  problems,  as 
the  net  does  not  have  to  unlearn  weights  as  the  recurrent  output  values  transition  from 
incorrect  to  correct  values  during  training.  In  some  cases  however,  this  approach 
backfires.  When  the  net  is  being  tested  with  new  data,  any  erroneous  outputs  are  fed  back 
as  inputs.  As  the  network  weights  were  trained  to  work  with  the  "correct"  outputs, 
erroneous  outputs  can  make  the  net  unstable.  In  this  case,  teacher  forced  learning  causes 
the  correct  response  to  represent  an  energy  repellor  rather  than  attractor.  The  RTRL  code 
was  modified  to  allow  for  teacher  forced  learning  if  the  user  selects  it.  This  is  set  using  a 
flag  in  the  parameters  file  access  by  the  net  upon  initialization. 

3.3.5  Skipping  weight  updates  for  learned  outputs 

Allred  and  Kelly(l)  proposed  performing  backpropagation  of  the  error  during 
training  only  when  the  error  for  a  neuron  was  greater  that  the  learning  rate  value  squared 
(a?-).  As  the  network  error  decreases  and  the  number  of  data  iterations  that  skip  error 
backpropagation  passes  90%,  a  is  decreased.  This  idea  was  incorporated  into  the 
recurrent  network  code  by  feeding  a  parameter  to  the  network  during  initialization  which 
sets  an  error  threshold  for  weight  updates.  When  the  output  error  is  below  the  threshold, 
the  weights  are  not  updated.  Since  the  calculation  of  the  weight  updates  in  the  RTRL 
algorithm  is  the  most  time  consuming  part  of  training  the  network,  skipping  weight 
updates  holds  the  potential  for  speeding  up  network  training  considerably. 
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3. 3. 6  Continuity  of  Recurrence  Between  Epochs 

During  each  iteration  of  the  RTRL  training  the  net  output  and  hidden  node  values 
are  forwarded  as  inputs  for  the  calculation  of  the  next  iteration.  The  exception  to  this  is 
at  the  end  of  the  epoch,  when  the  output  value  of  the  neurons  (and  the  p  matrix)  are 
replaced  with  zeroes.  When  training  the  RTRL  network  for  some  functions,  it  was  found 
to  be  better  not  to  zero  out  the  net  outputs  at  the  end  of  each  epoch.  This  is  due  to  the 
discontinuity  the  zeroing  of  the  outputs  induces  at  the  beginning  of  each  epoch.  An 
example  of  this  phenomenon  can  be  seen  when  training  the  network  to  emulate  a  low 
pass  Butterworth  filter  (paragraph  3.4.3).  At  the  initial  iteration  of  the  epoch  (f=0),  if  the 
training  data  is  zero  and  the  output  from  the  previous  epoch  has  been  zeroed,  the  net  sees 
only  the  bias  as  a  non  zero  input  (Figure  10). 

hidden  nodes 


1  input  data  0  0  0  0  0 

Bias  **  *  “  0  Zeroed  recurrent  values  at  t  *  0 

Figure  10:  The  RTRL  network  (1  output  4  hidden  nodes)  at  t=0,  after  zeroing  the 
output  values  from  the  last  iteration  of  the  prior  epoch 

The  weighted  bias  drives  the  output  values  of  the  RTRL  neurons,  which  are  fed 
back  into  the  net  during  the  next  iteration.  The  net  treats  this  input  as  an  impulse,  and 
generates  the  filter's  impulse  response  (Figure  1 0). 
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Figure  1 1 :  The  recurrent  network  shows  the  Butterworth  filter's  impulse  response  at  the  beginning 
of  the  epoch,  after  training  with  inputs  zeroed  at  the  beginning  of  each  epoch 

The  enabling  of  this  continuity  option  causes  the  output  from  the  final  iteration  of 
the  previous  epoch  to  be  forwarded,  as  the  RTRL  net  does  in  all  other  iterations,  as  inputs 
into  the  calculation  of  (he  next  iteration,  the  first  of  the  current  epoch.  One  complication 
to  this  option  is  the  fact  that  all  RTRL  training  files  have  some  delay  imposed  in  the 
network  outputs,  due  to  the  time  dependent  nature  of  the  network.  To  make  the  training 
on  the  data  truly  continuous,  the  desired  outputs  generated  by  the  last  data  iterations  in  the 
training  file  must  be  placed  as  training  outputs  at  the  beginning  of  the  file.  If  the  outputs 
are  delayed  for  two  iterations  for  example,  the  outputs  associated  with  the  last  two  data 
iterations  must  be  placed  as  the  desired  output  with  the  first  two  iterations  of  data  at  the 
start  of  the  file. 

3.4  Subgrouped  RTRL  Functional  Capabilities 

To  demonstrate  the  functional  equivalence  and/or  improvements  gained  using  the 
subgrouped  RTRL  code  over  the  original  RTRL  program  explored  by  Capt.  Lindsey, 
several  of  the  r^me  tests  were  performed  as  were  described  in  his  thesis(7).  The  repeated 
tests  were  the  Exclusive  OR  problem,  the  internal  state  problem,  and  the  Infinite  Impulse 
Response  (DR)  filter  simulation.  The  subgrouped  RTRL  was  also  tested  by  training  it  to 
categorize  the  phoneme  groups  in  a  sample  of  digitized  voice  that  had  been  pre-processed 
by  the  Payton(8)  algorithm.  During  the  training/testing  of  the  network  on  the  pre- 


processed  voice  data,  the  differences  in  performance  in  training  speed  and  accuracy  for 
this  task  between  the  subgrouped  RTRL  and  the  original  RTRL  code  were  measured 


3.4.1  Exclusive  OR  (XOR) 

The  Exclusive  OR  problem  is  a  classic  test  of  the  performance  of  a  neural 
network,  as  it  requires  the  identification  of  two  distinct  and  separate  areas  in  the  solution 
space.  This  is  a  task  beyond  the  capabilities  of  a  single  layer  network.  From 
appearances,  the  RTRL  network  seems  to  be  a  single  layer  network,  and  therefore 
incapable  of  learning  an  XOR  solution. 


The  hidden  nodes  of  an 
RTRL  network,  unlike  a  standard 
backpropagation  network,  do  not 
feed  directly  into  a  higher  layer 
during  the  processing  of  a  data  set  at 
the  input  layer.  Instead,  they  feed 
into  the  output  layer,  and  to 
themselves,  during  the  next  iteration. 
This  temporal  means  of  connecting 


Figure  12:  For  the  XOR  function,  valid  input  values  can 
not  be  isolated  into  one  contiguous  area. 


the  hidden  nodes  to  the  output  nodes 
enables  the  network  to  solve  the 


XOR  problem.  To  allow  for  the 

temporal  delay  in  passing  the  hidden  node  outputs  to  the  output  nodes  however,  the 
desired  output  of  the  network  must  be  shifted  ahead  in  time. 

The  network  configuration  used  to  solve  the  XOR  problem  with  the  subgrouped 
RTRL  network  was  identical  to  the  network  used  by  Lindsey’s  code,  i.e.  two  external 
inputs,  one  sigmoidal  output  and  four  hidden  sigmoidal  units.  The  ones  and  zeros  used  as 
inputs  were  generated  randomly,  and  the  training  output  for  the  XORed  function  of  the 
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two  inputs  delayed  by  two  time  steps.  Using  1024  training  vectors  the  network  was 
trained  over  20  epochs,  and  then  tested  using  the  trained  network  weights  on  a  test  XOR 
sequence. 

The  training  of  the  networks  was  repeated  using  non-integer  training  values 
between  0  and  1,  with  the  range  0  to  0.5  treated  as  a  zero  input,  and  0.5  to  1  equivalent  to 
a  input  of  1  for  the  determination  of  the  XOR  output.  Using  a  two  input,  one  sigmoidal 
output  node  and  5  hidden  node  configuration,  the  net  was  trained  for  300  epochs  through 
the  512  training  vectors.  The  trained  net  was  then  tested  on  a  1 024  vector  non-integer  test 
set. 

While  the  net  scored  perfectly  on  the  integer  portion  of  this  test,  it  only  scored  in 
the  91+  percentile  when  trained  and  tested  on  non-integers.  Interestingly  enough,  the 
misses  were  not  at  the  boundary  data  values  where  one  would  expect.  The  complete 
results  of  these  tests  are  discussed  in  Chapter  IV. 

3. 4. 2  Interned  State 

Backpropagation  networks  have  no  temporal  memory;  they  only  train  or  respond 
to  the  data  at  the  network  inputs  during  each  iteration.  This  property  makes  these 
networks  unsuitable  for  training  on  patterns  that  occur  over  a  series  of  iterations.  To 
recognize  a  pattern  over  time,  a  network  must  maintain  some  form  of  internal  memory  or 
state  over  one  or  more  time  intervals.  A  test  of  this  function,  as  discussed  by  Williams 
and  Zipser(17)  and  documented  in  Lindsey's  thesis(7),  consists  of  presenting  the  network 
with  data  vectors  of  4  inputs  labeled  a,  b,  c  a..-  d.  Within  each  data  vector  one  input 
randomly  selected  is  valued  at  1,  the  others  are  zero.  The  output  of  the  network  is 
normally  zero,  except  for  the  interval  immediately  after  a  valid  b  input  (b=  1)  follows  a 
valid  a  input.  When  this  occurs,  the  desired  network  output  is  1  for  one  time  interval. 
Inputs  c  and  d  have  no  effect  on  the  desired  network  output.  Training  and  test  files  for 
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this  problem  were  created  using  different  random  number  seeds,  so  that  die  order  of 
inputs,  internal  states  and  intervals  between  valid  a  and  b  inputs  were  varied. 

The  network  configuration  for  the  internal  state  test  was  four  inputs,  one 
sigmoidal  output,  and  one  sigmoidal  hidden  node.  When  tested,  the  net  apparently  had 
learned  this  task  perfectly,  as  had  the  original  RTRL  algorithm.  The  discussion  of  the 
results  of  the  internal  state  test  is  presented  in  Chapter  IV. 

3. 4. 3  Second  Order  HR  Lowpass  Filter  Simulation 

In  this  test,  the  subgrouped  RTRL  network  was  trained  to  simulate  a  second  order 
low  bandpass  Butterworth  filter.  The  filter  algorithm  used  to  produce  the  training  and 
test  output  data  for  the  network  is  described  by  the  equation 

X/]=0.0676(jc[/]+ 2  x[t  -  l]+x[/  -  2D+1.1422 At  ~  l]-0.4124y[r  -  2]  (25) 

The  inputs  to  the  network,  and  to  the  above  algorithm,  consisted  of  a  several  different 
data  series:  a  set  of  random  values  between  -1  and  1,  a  set  of  impulses  (1  followed  by  a 
string  of  zeros),  a  step  function  (0  000011111  11  1)  and  a  sampled  cosine  wave. 
The  network  was  trained  on  the  filtered  series  of  random  values,  followed  by  training  on  a 
filtered  series  of  impulses.  The  trained  network  was  then  tested  on  the  filtered  impulse, 
step  function,  cosine  wave  and  random  number  data  sets.  This  training  approach  differs 
from  the  one  described  in  Lindsey's  thesis(7),  where  the  filtered  impulse  series  was  used 
for  training.  The  method  used  for  the  subgrouped  RTRL  resulted  in  faster  training  and 
higher  accuracy  after  training.  The  net  configuration  consisted  of  one  input,  one  linear 
output  node,  and  one  sigmoidal  hidden  node. 

The  net  learned  to  emulate  the  Butterworth  filter  with  good  fidelity,  with  only 
minor  deviations  from  the  desired  response.  The  details  and  accuracy  of  the  network's 
filter  emulation  is  discussed  in  Chapter  IV. 
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3. 4. 4  RTRL  Versus  Subgrouped  RTRL  Performance 

The  subgrouped  RTRL  network  was  evaluated  for  performance  by  comparing 
how  quickly  both  of  the  RTRL  algorithms  (original  and  subgrouped)  performed  10 
training  epochs  using  0, 6, 12, 18, 24  and  30  hidden  nodes.  Training  was  performed  on  a 
Sun  Sparc  10  workstation,  and  processing  time  was  obtained  using  the  UNIX  time 
command,  which  reports  how  much  CPU  time  was  dedicated  to  the  process  in  question. 
This  allows  time  data  to  be  taken  without  concern  over  varying  CPU  workloads. 

The  training  file  consisted  of  389  data  vectors  (20  inputs  and  six  outputs)  from  a 
single  voice  data  file.  The  input  data  used  to  train  the  networks  consisted  of  digitized 
voices  derived  from  the  TTMTT  voice  database,  which  have  been  processed  through  the 
Payton(8)  auditory  model  algorithms.  This  training  data  required  the  RTRL  networks  to 
differentiate  between  six  classes  of  phonemes  (nasals,  vowels,  stops,  fricatives,  silence 
and  liquid-glides).  Training  runs  with  the  subgrouped  RTRL  network  were  performed 
twice,  first  without  allowing  weight  update  skipping,  and  the  second  time  with  the  error 
threshold  for  performing  weight  updates  set  at  0.00001 . 

The  subgrouped  RTRL  network  trained  in  substantially  less  time  than  the  RTRL 
algorithm,  but  the  RTRL  network  showed  a  higher  average  accuracy  in  identifying  the 
phoneme  classes  as  compared  to  either  subgrouped  RTRL  network.  Skipping  weight 
updates  in  the  subgrouped  RTRL  network  incurred  a  small  penalty  in  network  error,  but 
depending  on  the  application,  this  may  be  offset  by  the  increase  in  training  speed.  The 
time  required  for  training,  and  the  increase  in  processing  speed  for  these  network 
configurations,  is  discussed  in  Chapter  IV. 

3.5  Applications 

Since  the  strength  of  the  RTRL  algorithm  is  in  the  ability  to  deal  with  data  that 
changes  over  time,  the  subgrouped  RTRL  was  applied  to  two  time  dependent  problems. 
The  first  deals  with  testing  the  predictive  ability  of  the  network,  using  the  opening  value 
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of  the  pound  at  the  London  Exchange  for  training  and  then  testing  the  network.  The  other 
problem  deals  with  classification  of  time  dependent  data;  image  classification  based  on 
feature  changes  over  time. 

3.5. 1  London  Exchange  Prediction 

It  is  the  dream  of  every  financial  analyst  to  possess  a  sure  method  for  predicting 
the  value  of  a  stock,  commodity  or  currency  at  some  point  in  the  future.  One  potential 
method  for  this,  evaluated  in  this  thesis,  is  to  present  a  time  dependent  neural  network 
with  a  sequence  of  values  (daily  pound  exchange  rates)  over  time,  with  desired  output 
being  the  value  at  some  point  in  the  future. 

The  data  used  to  train  the  network  was  derived  from  the  London  Exchange,  and 
consisted  of  the  opening  exchange  value  of  the  pound  over  a  period  of  one  year.  The 
desired  output  provided  to  the  network  was  the  same  data  sequence,  shifted  in  time  one 
day.  At  any  particular  time  f,  the  desired  output  of  the  net  would  be  the  next  input  value 
at  time  1+1.  The  network  consisted  of  one  input,  one  linear  output,  and  three  sigmoidal 
hidden  nodes.  It  was  trained  for  500  epochs,  with  an  initial  learning  rate  of  0.0001.  The 
net  was  then  tested  on  the  opening  exchange  rates  for  a  different  year. 

While  the  net  learned  to  closely  match  the  desired  response,  examination  of  the 
plotted  net  output  shows  that  it  consistently  lagged  behind  the  desired  (future)  output. 
This  plot  of  the  results,  and  the  future  of  this  network  as  a  financial  analyst,  is  discussed 
in  Chapter  IV. 

3. 5. 2  Vehicle  Image  Classification 

For  the  application  of  the  subgrouped  RTRL  network  to  the  problem  of  image 
classification,  the  net  had  to  associate  sequences  of  single  value  codewords  with  the 
vehicle  the  sequence  had  been  derived  from. 
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The  sequences  of  codewords  or  feature  vectors  used  to  train  and  test  the  network 
were  derived  from  the  3  dimensional  CAD  representations  of  five  different  vehicles:  an 
M-60  tank,  an  M3S  truck,  a  BTR60  armored  personnel  carrier,  a  T62  tank,  and  an  M2 
infantry  fighting  vehicle.  The  CAD  images  of  each  vehicle  were  captured  from  multiple 
points  above  and  around  the  vehicle  representation,  to  uniformly  cover  possible 
perspective  points  for  viewing  the  vehicle.  The  multiple  images  of  the  five  vehicles  were 
processed(3),  and  the  features  extracted  into  64  possible  states,  represented  with 
codeword  values  of  0  -  63.  Sequences  of  the  codewords  represented  a  series  of  discrete 
perspectives  or  image  frames  of  a  vehicle,  changing  over  time  as  the  viewer  perspective 
point  changes. 

The  64  codewords  did  not  in  themselves  represent  any  of  the  vehicles;  each  may 
be  found  in  a  sequence  associated  with  any  of  the  five  vehicles.  Instead,  it  is  the 
sequencing  of  the  codewords  that  differentiates  between  the  vehicles. 

The  data  files  associated  with  each  of  the  vehicles  contained  200  sequences  of 
codewords  of  four  different  lengths;  50  sequences  each  of  14,  16,  18  and  20  codewords. 
The  five  data  files  were  combined,  with  each  sequence  associated  with  a  vehicle  category. 
Categories  were  represented  by  six  network  outputs,  one  for  each  of  the  vehicles  plus  one 
for  the  "header"  information  between  the  sequences.  The  codewords  were  represented  to 
the  network  in  binary  form,  with  the  header  assigned  a  value  of  one,  and  codewords  0  - 
63  presented  asOOOOOlO  to  1000001.  The  order  of  the  codeword  sequences  in  the 
datafile  was  randomized,  and  the  first  90%  of  the  sequences  were  used  as  training  data  for 
the  network. 

The  network  consisted  of  seven  inputs  (binary  representation  of  codewords),  six 
sigmoidal  outputs,  and  six  sigmoidal  hidden  nodes.  The  desired  output  values  used  to 
train  the  network  were  delayed  two  time  periods,  so  that  the  network  "saw”  the  desired 
output  at  time  t  that  corresponded  with  the  input  presented  at  time  t  -  2.  The  initial 
learning  rate  of  the  network  was  0.01,  momentum  was  set  at  0.98,  and  the  net  was  trained 
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for  1000  epochs.  After  training,  the  network  was  tested  on  the  remaining  10%  of  the 
randomized  datafile.  The  net  scored  a  99+%  accuracy  in  identifying  sequences  with  the 
correct  vehicular  image.  Chapter  IV  expands  on  the  results  of  this  application,  with  a 
discussion  of  the  network's  performance. 

3.6  Summary 

The  subgrouped  RTRL  algorithm,  and  the  modifications  made  to  foe  algorithm  in 
the  (development  of  foe  C  code  used  for  this  thesis,  were  described.  The  methodology  for 
foe  testing  of  foe  subgrouped  RTRL  was  also  discussed.  The  results  of  these  tests 
demonstrate  how  the  performance  of  the  subgrouped  RTRL  algorithm  relates  to  foe 
RTRL  algorithm  described  by  Lindsey(7),  as  well  as  how  foe  network  performs  at 
prediction  and  classification  based  on  time  varying  phenomena.  Chapter  IV  contains  foe 
results  and  discussion  of  these  tests. 
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IV  Results  and  Discussion 


The  history,  theory  and  testing  of  the  subgrouped  RTRL  algorithm  were  discussed 
in  Chapter  m.  This  chapter  reviews  the  operating  parameters  of  the  network  and  their 
effects,  and  the  results  of  the  tests  conducted  to  demonstrate  the  network's  abilities. 

The  subgrouped  RTRL  was  tested  to  quantify  the  impact  of  the  various  operating 
parameters  (momentum,  minimum  derivative  factor,  teacher  forced  learning,  weight 
update  skipping,  continuity  between  epochs)  that  had  been  added  to  the  net  to  enhance 
performance.  The  net  was  then  tested  to  determine  how  the  subgrouping  of  the  network 
caused  the  capabilities  of  the  network  to  change  from  that  of  the  RTRL  algorithm,  using 
the  performance  described  in  Lindsey’s  (9)  thesis  as  a  reference. 

The  problems  presented  to  the  subgrouped  RTRL  net  as  potential  applications 
were  twofold:  testing  the  net  as  a  predictor  using  the  daily  opening  values  of  the  British 
pound  as  training  data;  and  testing  the  net  as  an  image  classifier  based  on  learning  vector 
quantized  codewords  derived  from  vehicle  images. 

4.1  Network  Parameters 

To  demonstrate  the  effects  of  the  different  operating  parameters  on  the 
performance  of  the  subgrouped  RTRL  network,  several  of  the  factors  (initial  learning  rate, 
momentum,  minimum  derivative  factor,  weight  update  skipping)  were  varied  during 
network  training.  The  training  file  used  was  a  Payton  (8)  model  processed  digitized  voice 
file,  derived  from  the  TTMIT  database.  This  file  contained  389  data  vectors,  and  was  set 
up  to  train  the  net  to  provide  six  outputs,  one  for  each  of  the  broad  phoneme  classes. 

This  data  set  was  chosen  as  an  example  because  it  was  difficult  enough  that  the 
network  does  not  completely  solve  it,  reaching  a  maximum  accuracy  of  approximately 
80%.  It  was  believed  that  this  environment  would  help  to  demonstrate  the  effects  of  the 
network's  parameters,  more  so  than  a  problem  where  the  error  rapidly  drops  to  a  low 
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value.  The  initial  learning  rate  (alpha)  for  the  momentum,  minimum  derivative  factor, 
and  weight  update  skipping  trials  was  set  at  0.01,  with  a  network  configuration  of  20 
inputs,  6  sigmoidal  outputs,  and  12  sigmoidal  hidden  nodes.  Training  time  was  set  at  200 
epochs.  The  default  settings  of  the  parameters  (aside  from  those  varied  for  the  test)  are: 

Initial  learning  rate  (alpha)  =  0.01 

Momentum  =  0.0 

Sigmoidal  derivative  minimum  =  0.01 

Output  is  sigmoidal 

No  teacher  forced  learning 

Weight  updates  skipped  if  error  <=  0.0 

End  of  training  epoch  not  continuous  with  beginning  of  next  epoch 

» 

The  effect  of  making  die  training  data  and  network  operation  continuous  over 
different  epochs  (continuity  between  epochs)  is  demonstrated  while  training  the  net  to 
emulate  the  impulse  response  of  a  Butterworth  filter.  This  was  due  to  the  fact  that  this 
option  was  added  to  eliminate  a  phenomenon  found  while  training  the  network  for  this 
task. 

Each  line  on  the  graphs  shown  in  this  section  average  the  results  of  ten  training 
runs,  using  different  initial  values  of  the  randomized  weights.  Reported  net  accuracy  was 
based  on  matching  the  desired  output  category  with  the  network  output  with  the  highest 
activation  value. 

4. 1. 1  Initial  Learning  Rate 

The  value  of  an  adjustable  learning  rate  can  be  seen  using  the  subgrouped  RTRL 
code  evaluated  in  this  thesis.  In  many  cases  after  the  network  error  levels  off,  a  cut  in  the 
learning  rate  produces  an  immediate  improvement  in  net  accuracy  and  error.  While  the 
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net  will  lower  die  learning  rate  if  the  net  error  increases,  a  high  initial  learning  rate  is  not 
necessarily  harmless  to  the  overall  learning  behavior  of  the  net  during  training.  If  the  rate 
is  initially  too  high,  it  may  push  the  weights  to  a  state  that  the  net  must  recover  from  after 
the  rate  is  decreased.  If  the  rate  remains  too  high  the  net  error  tends  to  climb  over  time,  in 
some  cases  to  the  point  of  creating  overflow  errors. 

The  effect  of  the  initial  learning  rate  on  net  performance  was  examined  by  training 
the  subgrouped  RTRL  net  with  three  different  alphas  at  initialization  (0.1,  0.01  and 
0.001).  The  configuration  of  the  network  was  20  inputs,  6  sigmoidal  outputs  and  12 
sigmoidal  hidden  nodes.  Figure  13  shows  how  the  different  initial  learning  rates 
impacted  the  network  accuracy  during  training.  As  can  be  seen  from  this  graph,  the  best 
performance  was  achieved  with  an  initial  learning  rate  (alpha)  of  0.01 . 


Effect  of  Initial  alpha  on  accuracy 


Figure  13:  The  impact  of  different  initial  learning  rates  on  network  accuracy  during  training.  The 
average  standard  deviation  of  the  data  was  6.13. 


The  higher  accuracy  reported  at  the  start  for  the  initial  learning  rate  of  0.1  was 
caused  by  the  net  rapidly  changing  its  weights  to  adapt  to  the  most  recent  inputs.  This 
causes  the  net  to  be  correct  at  time  t,  but  after  passing  time  t  the  weights  would  change 
enough  that  the  same  inputs  might  produce  different  and  erroneous  outputs.  When  the 
net  training  is  halted  and  tested  while  in  this  reported  higher  accuracy  state  (circa  5 
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epochs)  the  net  performs  poorly,  and  the  test  reports  a  low  accuracy  result  This  is  due  to 
the  fact  dud  die  test  uses  fixed  weights,  rather  than  the  rapidly  adapting  weights  generated 
by  training  with  a  high  alpha  that  creates  temporary  error  minima  as  it  goes. 

4.1.2  Momentum 

The  inclusion  of  a  momentum  term  (p)  as  a  means  of  increasing  the  learning  rate 
of  a  neural  net  is  a  well  understood  mechanism  for  improving  learning  performance.  By 
retaining  a  fraction  of  the  weight  update  from  the  previous  learning  iteration  and  adding  it 
to  die  current  weight  update,  weight  changes  tend  to  continue  along  the  same  direction 
over  time.  This  has  the  tendency  of  damping  oscillations  in  the  network  as  it  learns,  and 
maintaining  the  progression  of  the  net  to  an  energy  minimum.  The  effect  of  the 
momentum  term  in  the  broad  class  phoneme  identification  problem  is  shown  in  Figure 
14. 


Effect  of  momentum  on  net  accuracy 


Figure  14:  The  accuracy  of  the  network  over  200  epochs  is  shown,  with  the  momentum  term  set 
at  0, 0.5  and  0.9.  The  average  standard  deviation  of  the  data  was  3.79. 


The  network  exhibited  a  higher  accuracy  during  training  with  a  momentum  of  0.9. 
Thus  the  apparent  benefit  of  momentum  appears  to  work  with  RTRL  type  networks  as 
well  as  for  standard  backprop  networks,  at  least  for  this  type  of  problem. 
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4. 1.3  Minimum  value  for  output  derivative  factor 

The  establishing  of  minimum  value  for  the  sigmoidal  derivative  factor  in  die 
weight  update  formula  has  a  profound  effect  on  the  learning  rate  for  a  certain  class  of 
problems.  This  class  includes  those  problems  for  which  the  desired  outputs)  of  the 
network  are  either  zero  or  one,  usually  to  signify  Boolean  decisions  (yes  or  no)  and  in 
determining  membership  in  categories.  Figure  15  shows  how  setting  the  minimum  level 
for  the  sigmoidal  derivative  affected  the  learning  rate  of  the  subgrouped  RTRL  network 
when  solving  the  broad  phoneme  category  problem. 


Impact  of  sigmoidal  minimum  on  accuracy 


Figure  15:  The  effects  of  setting  a  minimum  sigmoidal  derivative  factor  for  error  backpropagation. 
Note  how  the  network  did  not  progress  when  a  minimum  factor  was  not  set.  The  average 
standard  deviation  for  the  lines  was  5.25. 


The  addition  of  the  minimum  derivative  term  to  the  RTRL  network  was  perhaps 
the  most  effective  modification  in  terms  of  enhancing  learning  for  any  categorization 
problem.  Prior  to  this  modification,  the  RTRL  network  would  "latch"  and  not  progress 
unless  the  learning  constant  was  set  very  low.  This  reduced  the  effective  learning  rate  to 
an  unacceptable  level,  and  the  network  appeared  to  be  unsuitable  for  differentiating 
between  several  categories. 


42 


It  is  also  noteworthy  that  the  runs  with  the  highest  sigmoidal  minimum  set 
(minimum^).!)  reported  a  higher  accuracy  at  first,  which  then  dropped  in  much  the  same 
way  as  the  nets  training  with  a  high  initial  learning  rate.  Again,  when  tested  after  a  few 
(~5)  training  epochs  these  nets  report  a  low  accuracy,  because  the  net  was  adapting  too 
quickly  to  the  inputs.  Based  on  having  applied  this  minimum  factor  to  a  wide  range  of 
different  problems,  the  optimum  level  for  the  derivative  minimum  appears  to  be  on  the 
order  of  0.01  for  almost  all  training  problems  where  a  sigmoidal  network  output  is 
required. 

4.1.4  Teacher  forced  learning 

As  stated  in  Chapter  III,  teacher  forced  learning  can  cause  the  network  to  train 
fester,  but  may  reduce  the  network  accuracy  once  the  constraint  of  passing  only  the 
correct  outputs  back  as  network  inputs  is  removed,  such  as  during  testing  of  the  trained 
network.  To  demonstrate  the  use  of  teacher  forced  learning  therefore,  not  only  must  the 
learning  rates  with  and  without  teacher  forced  learning  be  examined,  but  the  accuracies  of 
the  network  after  training  must  be  checked  as  well.  The  differences  in  reported  accuracy 
during  training  is  shown  in  Figure  16. 


Effect  of  teacher  forced  learning  on  net  accuracy 
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Figure  16:  A  comparison  of  teaming  rates  with  and  without  teacher  forced  learning.  The  average 
standard  deviation  for  foe  data  used  to  plot  the  graph  lines  was  4.98. 
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As  can  be  seen  from  the  Figure  16,  the  addition  of  teacher  forced  learning  for  this 
problem  had  little  impact  on  the  learning  rate  of  the  network.  Testing  the  network  on  the 
training  data  file  showed  a  70.4  percent  accuracy  (a  =  17.69)  in  identifying  the  phoneme 
groups  with  teacher  forced  learning,  and  a  79.4  percent  accuracy  without.  (This  was  in 
part  due  to  an  outlying  test  result  of  20.3%  for  one  of  the  ten  networks  trained  using 
teachc  forced  learning,  pulling  the  average  down.  Without  this  outlying  value,  the 
teacher  forced  learning  nets  tested  at  an  average  75.85%  accuracy.)  For  this  type  problem 
teacher  forced  learning  provided  no  real  gains,  but  instead  induced  a  loss  in  phoneme 
group  recognition  performance. 

4.1.5  Skipping  Weight  Updates  for  Learned  Outputs 

The  computation  of  the  p  matrix  is  the  most  time  consuming  routine  in  the  RTRL 
network,  and  therefore  the  primary  driver  for  the  investigation  of  optimization  methods  to 
speed  up  learning  for  this  algorithm.  The  addition  of  the  weight  (and  p  matrix)  update 
skipping  can  cut  the  time  required  for  processing  each  epoch  of  data  up  to  50%, 
significantly  improving  the  training  rate  of  the  network. 


Impact  of  weight  update  skipping  on  net  accuracy 


Figure  17:  The  effects  of  skipping  iterations  during  learning  when  error  is  below  the  skip  threshold. 
The  average  standard  deviation  for  the  lines  was  4.29. 
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As  can  be  seen  in  Figure  17  however,  skipping  weight  updates  for  outputs  with 
low  errors  does  impact  the  accuracy  of  the  network  to  some  small  extent.  Paradoxically, 
the  accuracy  shown  by  the  nets  training  with  a  0.1  error  threshold  shows  a  higher  overall 
accuracy  during  training  than  training  with  lower  error  thresholds.  Skipping  more  of  the 
weight  updates  may  allow  the  net  to  focus  more  on  inputs  that  are  outside  the  average 
location  in  the  input  space,  or  perhaps  causing  the  net  to  learn  to  classify  some  inputs  that 
may  be  in  the  minority,  and  therefore  normally  not  caught  by  the  net.  The  effect  of 
changing  the  error  threshold  on  network  accuracy  was  the  primary  reason  why  the  skip 
threshold  was  made  to  be  changeable  by  the  network  user;  the  user  can  determine  where 
the  error  threshold  should  be  set. 

4. 1. 6  Continuity  of  Recurrence  Between  Epochs 

In  paragraph  3.3.9,  the  problem  caused  by  zeroing  out  the  network  outputs  at  the 
end  of  each  training  epoch  was  discussed.  The  effect  of  not  zeroing  out  the  net  outputs 
was  evaluated  by  training  the  RTRL  network  to  emulate  a  low  pass  Butterworth  digital 
filter.  The  net  was  trained  using  the  methods  detailed  in  paragraph  3.4.3,  with  and 
without  the  data  continuous  at  the  ends  of  the  training  epochs.  Each  net  was  trained  first 
on  a  sequence  of  random  floating  point  values  (between  -1  to  1)  with  their  low  pass  filter 
response,  followed  by  training  on  impulses  (  00  0  1  0  0...)  coupled  with  the  filter's 
impulse  response.  The  nets  were  then  tested  on  the  impulse  response  training  data.  The 
reactions  of  the  networks  to  the  impulse  are  shown  in  Figure  1 8. 

As  can  be  seen  in  Figure  18b,  removing  the  discontinuity  between  the  epochs 
removed  the  additional  impulse  response,  shown  in  Figure  18a.  Using  this  option  caused 
the  network  to  train  to  a  closer  match  of  its  output  to  the  desired  response,  to  the  extent 
that  the  lines  (desired  vs.  output)  are  almost  indistinguishable. 


Figure  18:  These  diarts  show  the  impulse  response  of  the  RTRL  network  without 
(a)  and  with  (b)  continuity  between  epochs.  The  net  was  trained  to  emulate  a 
Butterworth  fitter. 


4.2  Subgrouped  RTRL  Functional  Capabilities 

So  far  in  this  chapter  only  the  parameters  of  the  network  have  been  discussed. 
These  parameters  can  more  or  less  enhance  the  learning  efficiency  of  the  RTRL  network 
program,  but  do  not  necessarily  demonstrate  the  subgrouped  RTRL  network's 
characteristics  or  capabilities.  Subgrouping  the  RTRL  network  could  have  negatively 
impacted  the  ability  of  the  network  to  perform  various  functions.  This  section  of  the 
thesis  therefore  evaluates  the  subgrouped  RTRL  network's  properties  and  abilities,  as 
compared  to  the  RTRL  algorithm  described  in  Lindsey'sf?)  thesis.  Several  tests  described 
in  that  thesis  were  therefore  used  as  a  benchmark  to  measure  the  impact  of  subgrouping 
the  network. 
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4.2.1  Exclusive  OR 


As  in  Lindsey* s(7)  thesis,  the  first  problem  to  be  examined  to  demonstrate  the 
capabilities  of  the  network  is  the  exclusive  OR  (XOR).  As  described  in  section  3.4.1,  the 
net  was  trained  using  1024  binary  training  vectors,  with  the  two  inputs,  one  sigmoidal 
output  neuron  and  four  hidden  neurons.  The  outputs  provided  in  the  training  file  were 
delayed  by  two  time  steps.  After  20  epochs,  the  network  established  100%  accuracy  with 
a  mean  squared  error  of 0.030.  The  criteria  for  a  valid  response  from  the  network  was  an 
error  of  less  than  0.5,  meaning  that  the  mean  squared  error  had  to  be  less  than  0.125.  The 
network  was  then  tested  on  a  separate  binary  XOR  data  set  created  with  a  different 
random  number  seed,  and  was  found  to  have  a  100%  accuracy  on  the  test  file  as  well. 
This  demonstrated  that  for  binary  (0  and  1)  data,  the  net  was  able  to  generalize  the  XOR 
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Figure  19:  Plot  of  the  subgrouped  RTRL  network's  hits  and  misses  for  the  third 
analog  XOR  test  set  Network  accuracy  for  this  test  set  was  91.2%.  Hits  are 
designated  with  open  diamonds,  while  misses  are  show  with  filled  diamonds. 

The  next  step  in  training  the  network  to  recognize  the  XOR  problem  was  to  use  an 
analog  test  set,  with  two  input  data  values  between  0  and  1 .  If  one  input  was  greater  than 
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0.5  and  die  other  less  than  0.5,  the  output  (delayed  two  time  steps)  was  1,  otherwise  the 
output  was  0.  After  training  the  network  over  300  training  epochs  using  512  training 
vectors,  the  net  achieved  an  accuracy  of  99.6%.  The  net  was  then  tested  on  three  analog 
XOR  test  files,  and  received  accuracies  of  92.6%,  94.5%  and  91.2%.  This  corresponded 
with  the  results  seen  by  Lindsey(7). 

As  can  be  seen  in  Figure  19,  the  misses  in  the  third  test  file  (91.2%  accuracy)  do 
not  correspond  to  the  axis  between  the  decision  areas,  but  are  scattered  throughout  the 
test  space.  This  implies  that  the  net  is  solving  a  temporal  path  through  the  test  data, 
rather  that  differentiating  each  pair  of  inputs  as  valid  or  not. 

It  was  interesting  to  note  that  neither  the  original  RTRL  code  nor  the  subgrouped 
RTRL  code  could  solve  the  XOR  problem  for  an  output  time  delay  of  less  that  2  time 
steps.  This  may  indicate  that  to  solve  the  XOR  problem,  the  data  must  recursively  pass 
through  the  hidden  nodes  at  least  twice,  effectively  solving  the  problem  with  two  or  more 
hidden  layers.  Because  of  this,  it  may  be  feasible  that  for  any  problem,  a  balance  must  be 
struck  between  delaying  the  outputs  long  enough  to  use  multiple  hidden  layers  in  the 
problem,  and  having  the  outputs  close  enough  in  time  to  the  associated  inputs  that  the  net 
can  infer  a  causal  connection  between  the  two. 

4.2.2  Internal  State 

The  ability  of  the  subgrouped  RTRL  network  to  internalize  a  time  dependent  state 
was  demonstrated  by  the  test  described  in  section  3.4.2  of  this  thesis.  The  net  was  trained 
using  four  binary  inputs  (a,  b,  c  and  d),  with  the  desired  response  of  recognizing  the 
occurrence  of  the  first  valid  b  input  (value  =1)  after  a  valid  a  input.  After  training  on  the 
95  input  vectors,  the  subgrouped  RTRL  network  obtained  an  accuracy  of  100%,  given  a 
decision  threshold  of  0.5  for  valid  (high)  versus  nonvalid  (low)  network  outputs.  Figure 
20  shows  the  network  output  over  the  95  training  vectors  versus  the  desired  output. 
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Figure  20:  internal  State  Training  Results 


Figure  21:  Internal  state  Testing  Results 

The  network  was  then  tested  on  a  different  internal  state  data  file,  and  the 
subgrouped  RTRL  net  again  demonstrated  a  100%  accuracy  level  (Figure  21).  The 
network  therefore  was  able  to  generalize  the  solution  to  the  internal  state  problem,  as  had 
the  non-subgrouped  RTRL  network  used  by  Lindsey(7)  in  his  thesis. 

4. 2. 3  Second  Order  HR  Lowpass  Filter  Simulation 

The  subgrouped  RTRL  network  was  trained  to  emulate  a  lowpass  Butterworth 
filter,  as  was  described  in  section  3.4.3.  The  training  files  were  generated  calculating  the 
Butterworth  filter  response  to  a  binary  impulse  string  (0  00001000000  0),  and  to  a 
series  of  random  values  between  -1  and  1.  Training  took  place  in  two  steps,  first  training 
fire  network  using  the  random  number  Butterworth  response,  and  then  continuing  training 
on  the  impulse  string  training  file.  This  was  done  because  the  network  appeared  to  "catch 
on"  to  emulating  the  filter  response  faster  with  the  random  value  training  file,  perhaps  due 
to  the  richer  source  of  input  data  to  associate  with  the  desired  output. 


49 


4.2.3. 1  Network  Impulse  Response 

The  impulse  response  and  frequency  response  of  the  network  is  shown  in  Figure 
22.  The  frequency  response  was  plotted  by  performing  a  fast  Fourier  transformation 
(FFT)  of  the  desired  network  response,  and  of  the  network's  trained  response  to  a  binary 
impulse. 


Figure  22:  Impulse  response  and  frequency  response  of  the  subgrouped  RTRL  network  after 
training  as  a  Butterworth  filter 

The  impulse  frequency  response  of  the  trained  network  matches  the  desired 
frequency  response  well,  except  for  deviations  at  both  the  high  and  low  frequencies.  This 
match  is  closer  than  was  observed  by  Lindsey,  which  may  be  due  to  several  factors.  First, 
Lindsey's  training  file  provided  the  net  with  the  impulse  at  the  first  iteration,  while  the 
training  file  used  for  the  subgrouped  RTRL  network  placed  the  impulse  at  f=50.  Also, 
the  training  data  for  the  subgrouped  RTRL  network  was  continuous,  i.e.  the  delayed 
output  from  the  last  iteration  was  placed  as  the  desired  output  at  t=0.  The  network 
outputs  were  not  zeroed  at  the  end  of  each  epoch  (see  section  4.1.8),  removing  the 
discontinuity  at  M).  This  eliminated  the  spurious  impulse  response  discussed  in  section 
4.1.8. 

4.2. 3.2  Unit  Step  Response 

After  training  the  network  in  emulating  the  Butterworth  impulse  response,  it  was 
tested  Ci,  file  containing  a  step  function  (0000  1  1  1  1  1  1),  with  the  HR  filter  response 
as  the  desired  output  data.  Figure  23  shows  the  network  output  versus  the  desired 
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Butfterwocth  fitter  response.  The  subgrouped  RTRL  network  did  not  match  the  overshoot, 
nor  dad  the  final  steady  state  output  value  match  that  of  the  desired  output. 


Figure  23:  Plot  of  the  Butterworth  (War's  response  to  a  unit  step  function  versus  the  subgroup 
RTRL  network's  response.  Note  the  lack  of  overshoot  and  the  lower  steady  state  output  of  the 
network. 


The  lack  of  overshoot  indicates  that  the  RTRL  filter  is  slightly  overdamped  in  its 
response,  while  die  lower  steady  state  output  shows  that  the  network  possesses  a  DC 
offset  after  transitioning  to  the  higher  state.  This  DC  offset  may  be  caused  to  some  extent 
by  training  tiie  network  using  continous  recurrent  outputs  between  epochs  (section  4.1.6). 
When  the  network  outputs  are  zeroed  at  the  transition  between  training  epochs,  a  positive 
DC  bias  of  approximately  0.007  appears  at  the  network  output  whenever  the  value  should 
be  approaching  zero.  Making  the  epochs  continuous  appears  to  nearly  eliminate  this  bias. 
The  lower  steady  state  output  level  for  the  unit  step  response  function  may  however  be  a 
byproduct  of  removing  the  DC  bias  during  training. 

4.2.3 .3  Sinusoidal  Response 

The  HR  Butterworth  filter  trained  network  was  tested  using  a  sinusoidal  signal  as 
the  input,  coupled  with  the  Butterworth  filtered  response  as  the  desired  output  As  in 
Undsey's(7)  diesis,  the  sinusoid  consisted  of  two  cycles  of  a  cosine  wave  divided  into  128 
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sample  points.  The  subgrouped  RTRL  network  closely  matched  the  desired  filter 
response,  as  can  be  seen  in  Figure  24. 
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Figure  24:  The  subgrouped  RTRL  network  dosety  matched  the  Butterworth  filtered  response  to  a 
cosine  input  sinusoid.  The  frequency  response  plot  is  log-linear. 

The  frequency  domain  representations  of  the  network  output  and  Butterworth  filter 
response  matched  rather  closely,  indicating  that  the  RTRL  network  was  indeed  emulating 
die  Butterworth  filter  for  sinusoidal  inputs. 

4. 2. 3. 4  Pseudo-Random  Number  Sequence  Response 

The  final  test  of  the  subgrouped  RTRL  algorithm's  ability  to  emulate  a 
Butterworth  DR  filter  was  in  the  form  of  the  network  matching  the  OR  filtered  response  to 
a  series  of  random  values,  ranging  from  -1  to  1.  The  series  of  random  values  generated 
for  this  test  could  be  interpreted  as  representing  the  sampling  of  a  broad  spectrum  noise 
signal  source.  The  impulse  response  trained  RTRL  network  was  tested  using  the  random 
values  as  the  net  input,  with  the  Butterworth  algorithm  filtered  response  (delayed  1  time 
step)  provided  as  the  desired  output. 
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Figure  25:  a.  A  segment  of  the  Butterworth  filtered  random  noise  signal  data,  with  the  subgrouped 
RTRL  network's  output  b.  A  comparison  of  the  desired  frequency  response  to  a  noisy  (random) 
signal,  versus  the  subgrouped  RTRL  output 
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Figure  25a  displays  one  segment  of  die  filtered  signal  data  with  die  network's  output  Hie 
net  was  able  to  very  closely  match  die  filtered  signal,  almost  to  the  point  of  being 
indistinguishable  from  the  desired  sign;  1  used  as  a  reference. 

A  comparison  of  the  spectral  characteristics  of  the  filtered  noisy  signal  versus  the 
output  generated  by  the  network  (Figure25b)  reveals  that  die  network  closely  matched  the 
desired  frequency  response.  The  close  agreement,  throughout  the  spectrum  evaluated, 
explains  why  die  network  output  was  able  to  follow  the  desired  filtered  response  to  the 
noisy  signal  so  accurately.  The  degree  of  similarity  between  the  two  signals  may  be  due 
in  part  to  the  stage  in  training  the  neural  network  that  was  performed  using  a  random 
noise  signal  as  input  prior  to  training  on  the  impulse  response. 

4.2.3.5  RTRL  Versus  Subgrouped  RTRL  Performance 

The  previously  described  tests  demonstrate  the  comparable  capabilities  of  the 
subgrouped  RTRL  network  and  the  original  RTRL  algorithm  described  in  Lindsey's 
thesis(7).  While  parameters  may  be  changed  to  enhance  the  learning  accuracy  of  the 
network,  with  the  exception  of  the  skipping  of  the  weight  update  they  have  no  effect  on 
how  fast  the  network  learns.  The  question  therefore  is,  what  does  subgrouping  the 
network  gain  us? 

This  question  was  answered  by  comparing  the  time  required  to  process  10  training 
epochs  by  both  algorithms,  Lindsey  s( 7)  RTRL  program  and  the  subgrouped  RTRL 
network.  The  number  of  training  epochs  was  chosen  to  be  a  small  number  due  to  the 
processing  time  required  by  the  RTRL  network.  Longer  training  runs  would  show  a 
slight  proportional  difference  in  the  time  required  by  the  subgrouped  RTRL  network 
employing  weight  update  skipping,  as  the  percentage  of  data  points  skipped  varies  over 
time. 
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To  make  an  honest  comparison,  the  original  RTRL  code  was  modified  to  provide 
it  with  the  minimum  sigmoidal  derivative  function,  which  has  a  major  effect  on  network 
classification  performance.  Both  networks  were  set  with  the  minimum  sigmoidal 
derivative  factor  at  0.01.  Each  algorithm  was  tested  while  varying  the  number  of  hidden 
nodes,  to  determine  the  effect  on  training  time.  The  networks  consisted  of  20  inputs,  six 
sigmoidal  outputs,  and  0,  6,  12,  18,  24  and  30  hidden  nodes.  Training  data  was  a  single 
voice  data  file  derived  from  the  TIMIT  voice  database  and  preprocessed  using  the 
Payton(8)  auditory  system  algorithm.  The  subgrouped  algorithm  was  tested  under  two 
conditions,  with  weight  update  skipping  disabled,  and  with  the  weight  update  error 
threshold  set  at  0.00001 . 

Figure  26  shows  how  the  time  required  to  process  the  10  training  epochs  varied 
between  the  different  algorithms  and  with  varying  numbers  of  hidden  nodes.  Although 
only  six  data  points  each  are  shown  for  the  different  test  runs,  it  can  be  clearly  seen  that 
the  original  RTRL  code  takes  much  longer  to  process  multi-output  problems,  and  the 
difference  increases  geometrically  as  the  number  of  hidden  nodes  increase. 
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Figure  26:  Comparison  of  network  training  time  between  original  RTRL  code,  subgrouped  RTRL 
code,  and  subgrouped  RTRL  with  weight  update  skipping  enabled. 
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Figure  27  shows  die  increase  in  processing  speed  obtained  when  using  die 
subgrouped  RTRL  algorithm.  The  speedup  was  calculated  by  dividing  the  time  required 
by  the  subgrouped  RTRL  networks  into  the  time  required  by  the  original  code.  The  net 
not  utilizing  weight  update  skipping  shows  a  relatively  linear  speedup  when  compared  to 
the  number  of  network  nodes,  showing  an  approximately  0(n)  speedup  caused  by  the 
subgrouping.  This  is  not  true  of  the  net  that  employed  weight  update  skipping,  as  the 
speedup  is  not  constant  across  the  different  number  of  hidden  nodes.  As  more  hidden 
nodes  are  added  the  improvement  for  the  net  employing  weight  update  skipping  appears 
to  level  off  as  the  network  become  larger.  As  the  net  becomes  more  complex  the  average 
error  per  iteration  rises,  and  the  weight  update  skipping  occurs  with  less  frequency. 


Increase  In  speed  versus  original  RTRL 
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Figure  27:  Increase  in  processing  speed  of  the  subgrouped  RTRL  networks  (with  and  without 
weight  update  skipping),  versus  the  original  RTRL  algorithm 

If  time  to  process  is  not  the  critical  issue,  then  network  accuracy  must  be 
examined  as  well.  Figure  28  shows  how  the  accuracy  reported  by  the  three  network 
training  runs  differed  over  200  epochs,  when  each  network  used  6  hidden  nodes.  The 
subgrouped  RTRL  algorithms  performed  with  lower  accuracy  than  the  original  non- 
subgrouped  algorithm,  while  employing  the  same  number  of  hidden  nodes. 

The  original  RTRL  code  reported  a  higher  average  accuracy  over  the  two  hundred 
training  epochs,  indicating  that  the  subgrouping  does  incur  some  reduction  in  network 


capability.  This  validates  Zipser’s(20)  observation  that  subgrouping  the  net  can  reduce 
the  net  accuracy. 


Accuracy  of  RTRL  and  aubgroupcd  RTRL  networks 


1  100  199 
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Figure  28:  Comparison  of  the  accuracy  reported  by  the  original  RTRL  algorithm,  the  subgrouped 
RTRL  algorithm,  and  the  subgrouped  RTRL  algorithm  with  weight  update  skipping  enabled. 


4.3  Network  Applications 

In  section  4.1  of  this  thesis  the  effects  of  varying  the  network  parameters  was 
examined,  using  the  broad  class  phoneme  problem  to  baseline  their  impact  for  that 
application.  Section  4.2  compared  the  performance  of  the  subgrouped  RTRL  algorithm 
with  the  original  RTRL,  to  examine  what  the  network  lost  (or  gained)  in  speed,  accuracy 
and  capability  when  it  was  subgrouped.  In  this  section,  the  subgrouped  RTRL  network 
was  applied  to  two  time  dependent  problems:  predicting  future  behavior  based  on 
behavior  in  the  past,  and  classification  based  on  sequences  of  feature  changes  over  time. 


4. 3. 1  London  Exchange  Prediction 

The  configuration  of  the  network  for  this  application  was  one  input,  one  linear 
output,  and  three  sigmoidal  hidden  nodes.  The  input  to  the  network,  one  year’s  worth  of 
opening  market  values  for  the  pound  in  the  London  Exchange,  was  paired  with  the  same 
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data  shifted  one  day  ahead  in  time.  This  was  to  train  the  network  to  predict  what  the  next 
day’s  opening  quote  would  be.  Training  was  initiated  with  a  learning  rate  of  0.0001,  and 
was  competed  after  500  epochs.  The  network  was  then  tested  using  the  opening  market 
values  for  the  pound  for  a  different  year,  to  determine  whether  the  net  would  match  the 
desired  next  day  values. 

The  output  of  the  network  is  shown  in  Figure  29,  along  with  the  desired  output. 
Examination  of  the  figure  shows  that  the  network  consistently  lags  behind  the  desired 
output  The  match,  although  close,  does  not  demonstrate  the  net  being  able  to  predict  the 
next  day's  opening  quote.  Although  an  enticing  possibility,  the  RTRL  algorithm 
apparently  can  not  be  used  as  a  means  of  predicting  changes  in  the  value  of  the  pound 
using  past  performance  as  training  data. 


Figure  29:  Net  performance  on  test  data  for  London  exchange  rate  prediction 

4.3.2  Vehicle  Image  Classification 

The  application  of  the  subgrouped  RTRL  network  to  the  task  of  classifying 
vehicles  required  some  repeated  attempts  before  the  correct  approach  was  determined. 
Initially,  the  net  was  trained  using  the  sequences  of  codewords  as  a  single  input, 
providing  the  net  with  a  "signal"  that  was  hoped  would  be  characteristic  of  each  vehicle. 
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The  net  consisted  of  that  single  input,  plus  six  sigmoidal  hidden  nodes  and  six  sigmoidal 
output  neurons.  Five  of  the  sigmoidal  outputs  represented  a  class  of  vehicle,  one  output 
per  type.  The  sixth  output  was  used  to  identify  the  strings  of  -1  values  used  to  separate 
the  vehicle  sequences.  Training  was  initiated  using  a  learning  rate  of  0.1,  and  the  net  was 
trained  for  200  epochs.  The  net  trained  very  poorly  on  the  sequence  information,  and  so 
the  attempt  was  repeated  using  teacher  forced  learning. 

Adding  this  function  to  the  net  training  approach  appeared  to  have  an  immediate 
and  positive  effect  on  the  network's  ability  to  differentiate  between  the  vehicle  sequences. 
The  net  reported  a  score  of  +90%  within  five  epochs,  and  finished  after  20  epochs  with  a 
scoring  of  96.7%.  The  test  file  scored  similarly,  with  a  97%  accuracy  rate.  This  figure 
does  not  mean  that  the  net  recognized  97%  of  the  sequences.  Instead,  this  means  that  the 
net  correctly  categorized  that  percentage  of  the  data  points  in  the  file,  with  each  sequence 
consisting  of  14  -  20  data  points,  and  the  header  spacing  between  the  sequences 
containing  six  -1  values. 

Because  of  the  rapid  training  and  high  accuracy,  the  code  for  the  subgrouped 
RTRL  was  re-examined  to  verify  its  startling  performance  was  valid  for  this  task.  It  was 
found  that  in  the  subroutine  in  which  the  desired  outputs  were  substituted  for  the 
recurrent  network  outputs  (teacher  forced  learning),  the  code  did  not  differentiate  between 
the  training  and  testing  of  the  network.  In  other  words,  when  the  net  was  being  tested 
with  the  teacher  forced  learning  selected  the  substitution  was  still  occurring;  the  net  was 
"cheating"  by  looking  at  the  answers  during  test.  When  this  was  corrected,  the  test  score 
for  this  task  changed  to  a  46%  accuracy. 

To  resolve  this  problem,  a  different  approach  had  to  be  taken.  The  net  was 
apparently  not  able  to  discern  each  of  the  codewords  as  a  "state."  Instead,  the  net  had 
been  trained  much  as  it  would  have  been  on  a  analog  signal,  making  codewords  adjacent 
in  state  nearly  equivalent  in  value  for  determining  a  response.  To  help  the  net 
differentiate  between  the  codewords  as  distinct  "states,"  the  input  values  were  converted 
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into  binary  code.  The  range  of  input  values  had  been  from  -1  (header  value)  to  63.  To 
convert  this  to  binary  information  each  input  was  incremented  by  two,  and  then  expressed 
in  binary  (0000001  to  1000001). 

The  recurrent  network  at  this  point  consisted  of  7  inputs  (binary  representation  of 
codewords  plus  header),  six  sigmoidal  outputs,  and  12  hidden  nodes.  Ten  networks  were 
trained  over  400  epochs,  using  an  initial  learning  rate  of  0.01  and  a  momentum  of  0.98. 
After  training,  the  net  reported  an  average  89.7%  accuracy  rate  in  recognizing  the  data 
points  in  the  training  file.  The  trained  networks  were  then  tested,  using  the  10%  of  the 
data  source  file  reserved  for  this  purpose.  The  nets  reported  an  average  of  recognizing 
89.9%  of  the  test  data  points.  Figure  30  shows  how  the  recognized  data  points  translate 
into  identified  sequences. 


Tim# 

Figure  30:  Response  of  subgrouped  RTRL  network  for  sequence  test  file  versus  the  desired 
categorical  output 

Many  of  the  sequences  were  identified  immediately  while  others  experienced 
some  transients,  usually  at  the  beginning  of  the  sequence,  during  which  the  net 
misclassified  those  data  points.  Because  of  this,  the  first  data  points  in  the  sequences 
were  ignored  when  determining  the  vehicle  selected  by  the  network.  If  the  class  most 
frequently  provided  by  the  net  in  the  last  seven  points  of  each  sequence  is  used  to  classify 
it,  the  trained  network  with  the  highest  accuracy  correctly  identified  almost  all  (99.22%) 
of  the  test  sequences.  Average  accuracy  was  96.13%,  with  a  standard  deviation  of  2.80. 
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It  was  surprising  how  quickly  the  net  selected  the  correct  vehicle  in  many  of  the 
sequences,  implying  that  independent  of  sequence  length,  the  information  necessary  to 
identify  the  vehicle  is  often  found  within  the  first  two  or  three  values  of  each  sequence. 

4.4  Summary 

The  subgrouped  RTRL  net  was  tested  both  to  determine  the  impact  of  the 
pararmeters  added  to  enhance  performance,  and  to  determine  the  capabilities  and 
limitations  of  the  network  in  solving  time  dependent  problems.  Each  parameter 
(momentum,  minimum  sigmoidal  derivative  factor,  weight  update  skipping,  continuity  of 
recurrence  between  epochs,  and  maximum  sigmoidal  input)  was  varied  using  the  broad 
class  phoneme  problem,  and  the  impact  of  the  modification  was  evaluated. 

The  effects  of  the  network  parameters  varied  in  impact,  with  the  biggest 
improvement  gained  by  setting  a  minimum  value  for  the  sigmoidal  derivative  factor  in  the 
weight  updates.  Momentum  and  the  initial  learning  rate  each  impacted  performance  to  a 
lesser  extent,  and  the  best  values  for  these  parameters  are  problem  dependent,  found 
through  trial  and  error.  Weight  update  skipping  provided  enough  acceleration  in  network 
training  time  that  it  more  than  compensated  for  the  small  fluctuations  in  accuracy  it 
caused,  and  teacher  forced  learning  either  did  not  help  net  accuracy  or  dramatically 
decreased  accuracy  when  the  net  was  tested.  Removal  of  the  discontinuity  in  data  and 
recurrent  outputs  between  training  epochs  eliminated  a  spurious  impulse  response 
observed  when  training  the  net  to  emulate  a  low  pass  filter. 

The  subgrouped  RTRL  was  also  tested  to  determine  whether  it  was  functionally 
equivalent  in  performance  and  characteristics  to  the  RTRL  algorithm  evaluated  in  Capt 
Randall  Lindsey's  thesis(7).  The  net  was  tested  on  the  XOR  problem,  the  internal  state 
problem  and  the  Butterworth  filter  emulation  problems  that  were  discussed  in  Lindsey's 
thesis.  For  each  problem,  the  subgrouped  RTRL  network  performs  as  well  or  better  than 
the  original  algorithm. 
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For  the  XOR  problem,  the  subgrouped  RTRL  net  performed  similarly  in  behavior 
and  accuracy  to  the  RTRL  network  as  described  in  Lindsey's  thesis,  exhibiting  the  same 
temporal  dependence  in  it's  selection  of  valid  and  invalid  XOR  inputs,  with  the  network 
misses  scattered  across  the  problem  space.  Also  as  in  Lindsey's  thesis,  the  subgrouped 
RTRL  network  solved  the  internal  state  problem  with  100%  accuracy.  For  the 
Butterworth  filter  problem,  the  subgrouped  RTRL  net  matched  the  required  output  more 
closely  that  the  RTRL  network,  which  is  attributed  to  the  removal  of  the  discontinuity  in 
the  impulse  response  training  data  and  in  the  recurrent  network  outputs  between  training 
epochs. 

Both  forms  of  RTRL  networks  were  also  applied  to  the  problem  of  determining 
broad  phoneme  class  categories  for  a  single  voice  file,  to  quantify  differences  in  training 
time  and  accuracy.  The  number  of  hidden  nodes  used  by  each  network  was  varied  during 
the  training  trials,  to  plot  net  size  against  training  time.  Because  the  subgrouped  RTRL 
algorithm  could  be  accelerated  by  using  weight  update  skipping  it  was  tested  twice,  once 
with  weight  update  skipping  disabled  and  again  with  the  error  threshold  for  skipping 
weight  updates  set  at  a  low  (0.0001)  level. 

The  subgrouped  RTRL  net  performed  significantly  better  than  the  original  RTRL 
net  in  the  time  required  to  process  the  training  data  (a  7  -  37  times  increase  in  training 
speed),  but  it  appears  subgrouping  does  cause  a  tradeoff  (8%  decrease)  in  network 
accuracy.  There  was  also  an  additional  slight  tradeoff  in  network  accuracy  (1%)  for  a 
reduced  processing  time  when  the  subgrouped  RTRL  net  trained  with  skipping  enabled. 

After  characterizing  the  performance  of  the  subgrouped  RTRL  network,  it  was 
applied  to  two  problems:  stock  market  value  prediction  and  vehicle  image  recognition. 
The  network  was  able  to  match  the  predicted  value  it  was  trained  to  produce  relatively 
well,  but  the  net  output  consistently  lagged  the  desired  predicted  value.  Because  of  this, 
the  subgrouped  RTRL  algorithm  would  not  make  a  useful  tool  of  any  stock  analyst  if 
trained  in  the  same  manner. 
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The  application  of  the  subgrouped  RTRL  network  performed  very  well  in 
identifying  the  five  different  types  of  vehicles,  based  on  the  sequence  of  image  features 
provided.  The  net  was  only  successful  in  learning  this  task  after  the  correct  format  for  the 
input  data  was  applied,  i.e.  the  inputs  were  expressed  in  binary  to  allow  the  net  to 
differentiate  between  each  codeword  as  a  seperate  and  distinct  state. 
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V  Conclusions  and  Recommendations 

This  thesis  represents  an  effort  to  improve  on  the  functionality  and  speed  of  the  RTRL 
algorithm  documented  in  Capt  Randall  Lindsey's  Master's  thesis(7).  This  effort  was  performed 
because  of  the  wide  applicability  of  a  time  dependent  neural  network  to  technical  problems  facing 
the  Air  Force  today. 

5.1  Conclusions 

The  subgrouped  RTRL  algorithm  has  been  demonstrated  to  be  able  to  solve  multiple  time 
dependent  problems.  Chapter  IV  details  how  several  of  the  network  parameters  enhanced 
performance  in  network  accuracy  and/or  time  required  to  process  training  data.  The  network  was 
able  to  solve  problems  identical  or  similar  to  those  that  were  solvable  with  the  original  RTRL 
algorithm,  so  it  appears  that  subgrouping  does  not  reduce  the  functionality  of  the  network.  These 
problems  (XOR,  internal  state,  second  order  IER  Butterworth  filter  simulation)  demonstrated  the 
functional  equivalence  of  the  two  algorithms.  It  was  also  demonstrated,  using  the  broad  class 
phoneme  identification  problem,  that  the  subgrouped  RTRL  trained  in  significantly  less  time,  but 
with  less  accuracy  than  the  original  RTRL  network. 

The  subgrouped  RTRL  algorithm  was  applied  to  two  problems:  stock  market  opening 
value  prediction  and  vehicle  image  identification.  While  closely  approximating  the  predicted  value 
of  the  stock  market,  the  net  lagged  behind  the  market  behavior  enough  to  make  it  unwise  to  use  it 
as  a  prediction  tool.  The  net  performed  very  well  in  identifying  vehicle  images  based  on  time 
varying  image  features  when  the  problem  was  presented  properly. 

5.2  Recommendations 

Based  on  the  results  of  comparing  the  two  networks,  it  is  recommended  that  of  the  two 
forms  of  RTRL  networks,  the  subgrouped  RTRL  network  be  applied  to  temporally  dependent 
problems  first.  If  the  net  fails  to  provide  the  required  accuracy  for  the  task,  then  the  RTRL 
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network  should  be  tried.  Other  avenues  for  speeding  up  the  RTRL  network  should  also  be 
explored,  such  as  locking  those  p  matrix  values  that  do  not  change  over  time,  so  that  the  net  does 
not  waste  training  time  updating  them.  It  may  be  possible  to  start  training  with  a  large,  multiple 
hidden  node  network,  and  gradually  cull  out  the  weights  that  remain  sufficiently  small.  From  a 
programming  perspective,  this  would  be  less  complex  to  achieve  than  incrementally  enlarging  the 
net  from  a  smaller  configuration  to  improve  accuracy. 

On  evaluating  the  differences  in  performance  between  the  identification  of  vehicle  classes 
and  broad  phoneme  classes,  it  might  be  beneficial  to  employ  similar  processing  techniques  on  the 
voice  data  to  those  used  to  process  the  vehicle  images.  The  image  data  was  Fast  Fourier 
Transformed  (similar  in  function  to  the  Payton(g)  process)  and  then  vector  quantized  using  a 
clustering  algorithm.  It  may  greatly  improve  the  subgrouped  RTRL's  performance  to  use  a 
clustering  algorithm  on  the  Payton  processed  voice  data,  and  use  the  cluster  coordinates  (or 
representative  codewords)  for  training  the  RTRL  net.  The  network  would  then  be  using  the 
information  embedded  in  the  sequence  of  data  provided  to  leam  to  differentiate  phonemes,  and 
possibly  not  from  the  data  points  themselves. 

5.3  Future  Research 

It  is  apparent  from  the  testing  of  the  subgrouped  RTRL  network  that  information  for 
solving  complex  problems  may  be  found  not  only  in  features  found  at  each  point  in  time,  but  also 
in  how  the  features  change  over  time.  The  impact  of  temporally  changing  information  on 
classification  and  recognition  problems  needs  to  be  further  explored.  Many  problems  being 
attacked  at  this  time  from  a  static  viewpoint  may  become  more  solvable  if  the  added  dimension  of 
time  is  used,  particularly  in  the  area  of  feature  recognition.  Perhaps  time  varying  features  found  in 
aerial  views,  or  in  moving  faces,  may  hold  the  clue  for  rapid  identification. 


64 


I**#'* 


Appendix  A.  Software  Development 


The  C  code  for  the  subgrouped  RTRL  network  is  found  in  Appendix  B,  along 
with  associated  files  required  for  it's  compilation  and  operation.  The  name  of  the  neural 
network  file  used  for  this  thesis  is  called  "  recnet.  c.M  The  ANSI  C  code  has  been  run,  with 
minor  modifications,  on  Sun  workstations,  NeXT  workstations,  and  on  a  486  processor 
IBM  compatible  PC. 

The  format  for  running  recnet  is  "recnet  [datafile]  [t]n.  Datafile  represents  the 
name  of  the  file  containing  the  network  training  or  test  data.  If  not  provided,  the  net  will 
look  for  a  file  named  "data.dat"  for  training  data.  If  "t"  (or  any  added  third  term)  is 
included  with  the  file  name,  the  net  uses  the  datafile  as  a  test  file,  based  on  the  weight 
values  stored  in  "weights.dat." 

A.1  File  Parameters 

At  initialization,  recnet  requires  a  parameter  file  named  parameter.dat 
(parametr.dat  on  PCs)  to  load  in  the  operating  parameters  it  will  train  or  test  under.  The 
following  represents  the  parameters  used  for  most  of  the  tests  described  in  this  thesis: 


epochs 

alpha 

seed 

moment 

y_pr  min 

100 

0.01 

152367 

0.0 

0.01 

weights 

linear 

teacher  skip 

cat 

loop  data 

0 

0 

0  0.0000 

1 

0 

verbose 

max  val 

bp  factor 

1 

50 

0.00 

keep  sum 

O 

tshold  preview 

0.000 

O.i 

0 

The  epochs  value  determines  the  number  of  training  epochs  the  network  will  run. 
The  learning  coefficient,  alpha,  is  set  at  the  beginning  of  the  training  run  but  is  halved 
when  the  error  rate  does  not  change  or  when  the  error  reported  climbs  more  than  a  set 
threshold  as  the  network  trains.  The  seed  value  is  used  to  initialize  the  random  number 
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generator,  used  to  create  the  initial  weight  values.  Moment  refers  to  the  momentum 
factor,  while  y_pr  min  is  the  minimum  sigmoidal  derivative  factor  set  for  the  output 
neurons. 

Weights  is  a  flag  set  to  1  if  the  net  is  to  continue  using  the  weights  found  in  the 
file  "weights.dat,"  while  a  value  of  one  tells  the  net  to  create  net  weight  values.  Linear  is 
another  flag,  in  which  1  tells  the  net  to  output  tha  activation  values  of  the  output  neurons 
and  0  causes  the  net  to  provide  a  sigmoidal  output.  The  teacher  flag  is  set  to  1  to  enable 
teacher  forced  learning,  0  to  disable  the  function.  A  1  set  for  the  double  flag  causes  the 
network  to  pass  through  the  training  data  twice  during  each  epoch,  with  weight  updates 
disabled  during  the  second  run.  A  0  disables  this  feature.  Skip  sets  the  error  threshold 
during  each  iteration;  above  the  threshold  the  net  performs  weight  updates,  below  the 
threshold  the  updates  are  skipped.  Cat  set  with  a  value  of  one  tells  the  net  to  score  the 
outputs  as  categories,  selecting  the  output  with  the  highest  activation  value.  A  zero  on 
this  flag  makes  the  net  score  each  output  as  good  or  bad  based  on  whether  the  output  error 
exceeds  the  threshold  given  in  OK_threshold.  The  loopdata  flag  causes  the  net  to  not 
zero  the  net  output  values  and  p  matrix  at  the  end  of  each  epoch  when  enabled. 

The  verbose  flag  enables  (or  disables)  the  net's  output  of  information  to  the 
screen,  while  maxval  sets  the  threshold  for  the  activation  value  of  a  neuron  above  which 
the  sigmoidal  output  is  set  at  one,  while  a  value  below  the  negative  of  this  limit  causes 
the  neuron  to  output  a  zero.  Bp_factor  sets  the  amount  by  which  the  backprop  algorithm 
added  to  the  net  can  influence  the  weight  updates,  and  usually  ranges  from  0-1. 
Keep  sum  give  the  net  the  factor  by  which  it  multiplies  the  neural  activation  values 
between  data  iterations,  allowing  the  past  neural  activity  to  influence  its  current  output. 
OK  threshold  is  the  error  threshold  for  the  output  neurons,  whenever  the  categorical 
scoring  flag  is  off.  If  the  error  at  an  output  node  is  within  the  threshold,  it  is  considered 
good.  Preview  is  a  flag  that  allows  training  on  the  first  25%  of  the  data  file,  during  the 
first  25%  of  the  training  epochs.  If  the  data  is  uniformly  distributed,  the  net  can  quickly 
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generalise  on  the  first  25%  and  train  on  die  full  file  for  the  remaining  75%  of  the  training 
epochs. 

A.2  Output 

Recnet  will  create  various  files  when  run,  depending  on  the  function  selected. 
These  files,  and  die  conditions  that  cause  than  to  be  created  are: 

Training  die  network: 

weights.dat  -  save  the  values  of  the  weight  matrix  when  training  is  concluded 
netoutdat  -  generated  at  the  end  of  training,  this  file  contains  the  network  outputs 
generated  during  the  last  epoch  with  the  desired  values  in  a  format  that 
will  allow  training  a  network  based  on  the  net  outputs  and  the  desired 
outputs. 

netout2.dat  -  same  as  netoutdat,  except  the  activation  values  of  the  network  are 
paired  with  the  desired  output  values. 

sequence.dat  -  created  at  the  end  of  training  when  the  categorical  output  function 
is  enabled.  Pairs  the  winning  network  output  with  the  desired  output  so 
that  net  accuracy  can  be  determined. 

Testing  the  network: 

tstcheck.dat  -  pairs  network  outputs  with  desired  outputs 
testdes.dat  -  creates  a  file  of  the  network's  desired  training  values,  against  which 
the  network  output  was  scored  during  test 
error_tstdat  -  provides  the  net's  cumulative  error  and  score  as  the  net  passes 
through  the  test  data 

sequence.dat  -  same  purpose  as  in  training,  except  compares  net  output  with  test 
desired  categorical  output 
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Appendix  B:  Recurrent  Neural  Network  Code 


This  appendix  contains  a  listing  of  the  subgrouped  real  time  recurrent  learning 
source  code  and  its  associated  files.  The  files  "nrutil.c"  and  "rani  .c"  were  derived  from 
the  Numerical  Recipes  in  C  book  (1 1). 

/*  RECNET.C 


A  recurrent  neural  network  which  follows  the  algorithm 
proposed  by  Williams  and  Zipser  in  their  paper  "A  Learning 
Algorithm  for  Continually  Running  Fully  Recurrent 
Neural  Networks",  Neural  Computation  1, 270-280  (1989). 

date:  30  May  91 
update:  7  Mar  94 

written  by:  Randall  L.  Lindsey,  GEO-9  ID 
modified  by:  Jeffrey  S.  Dean,  PTS-92D 

************************************************************************ 

*/ 

#include  <stdlib.h> 

#include  <stdio.h> 

#include  "definitions.!!" 

#include  "macros.h" 

#include  <math.h> 

#include  <string.h> 

/*********************************************************************** 
ROUTINE  NAME:  main 

DESCRIPTION:  Based  on  the  number  of  arguments  presented  when  recnet 
is  invoked,  main  causes  the  net  to: 

a.  Train  on  file  data.dat 

b.  Train  on  the  filename  following  recnet 

c.  Test  the  accuracy  of  trained  network  on  the  filename 
data 

INPUTS:  argc  -  count  of  arguments  following  recnet  when  initiated 
argv  -  array  of  argument  strings  given  to  recnet 

FUNCTIONS  CALLED: 
check_file()  -  determines  if  datafile  exists 
init  netO  -  initializes  the  network.  Allocates  memory  for  vectors 
and  matrices,  and  initializes  them  to  zero.  Sets 


random  weight  values. 

rcad_data()  -  reads  the  data  from  the  input  file,  which  include  the 
input  vectors  and  training  outputs. 

readweightsO  -  reads  weights  from  prior  training  session,  to  continue 
training  from  that  point  when  the  weights  were  saved. 

train  netO  -  trains  net  based  on  inputs  and  training  data. 

CALLED  BY:  None 

LAST  UPDATED:  19  May  1993  BY:  Jeffrey  S.  Dean 

*M*MM*****M«***M**M**MMM*MM**MMU*M**MM*M*MMMMMM/ 

main(argc,argv) 

intargc; 

char  *aigvQ; 

{ 

switch  (argc)  { 

case  1:  /*  selected  if  user  types  "recast"  at  prompt. 

Trains  network  using  data  in  "data.dat".  */ 
datafile="data.dat";  /*  Default  name  of  datafile.  */ 

check_file();  /*  Check  to  see  if  die  datafile  name  exists.  */ 
init_net(l);  /*  Initialize  and  define  all  network  variables. 

Allocate  memory  for  all  vectors  and  matricies 
and  set  initially  to  zero.  Randomly  set  the 
weight  matrix  using  file  pseudo-random  number 
generator.  *1 

readdataO;  /*  Read  data  vector  array  and  desired  output  */ 
if(weights=l) 

read  weightsO;  /*  Read  old  weights,  if  restarting  learning  */ 
train  netO;  /*  Propagate  inputs  and  update  weights  based  on 
gradient  descent  */ 

break; 

case  2:  /*  selected  if  user  types  "recnet  <filename>" 

at  prompt  Trains  network  using  <filename> 
data.  */ 

datafile=argv[l];  /*  User  specified  name  of  datafile.  */ 
check  fileO;  /*  Check  to  see  if  the  datafile  name  exists.  */ 
init_net(l);  /*  Initialize  and  define  all  network  variables. 

Allocate  memory  for  all  vectors  and  matricies 
and  set  initially  to  zero.  Randomly  set  the 
weight  matrix  using  the  pseudo-random  number 
generator.  */ 

read  dataO;  /*  Read  data  vector  array  and  desired  output  */ 
if(weights=l) 

read  weightsO;  /*  Read  old  weights,  if  restarting  learning  */ 
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trainnctO;  /*  Propagate  inpatt,  compote  outputs,  and 

update  weights  based  oa  gradleat  desceat  */ 

break; 

case  3:  I*  selected  if  user  types  "recnet  <fileoame>  t" 

at  prompt  Tests  network  using  <filename> 
data.  */ 

datafile=argv[l];  I*  User  specified  name  of  datafile.  *1 

check  jBleO;  /*  Check  to  see  if  the  datafile  name  exists.  */ 

init_net(2);  /*  Initialize  and  define  all  network  variables. 

Allocate  memory  for  all  vectors  and  matricies 
and  set  initially  to  zero.  Randomly  set  the 
weight  matrix  using  the  pseudo-random  number 
generator.  */ 

read_weightsO;  /*  Read  weight  matrix  and  saved  p  states.  */ 
rcad  dataO;  /*  Read  data  vector  array  and  desired  output  */ 
test_net();  /*  Propagate  inputs  and  compute  outputs.  */ 

break; 

default: 

printfCVnUsage:  recnet  [datafilename.dat]  [testflag]\n\n"); 
break; 

} 

return  0; 

}  /*  End  MAIN()  of  NET.C  */ 

ROUTINE  NAME:  trainnetO 

DESCRIPTION:  Trains  the  RTRL  net  over  the  selected  number  of  epochs.  The 
user  has  several  options,  selected  in  the  "parameters.dat" 
file.  He  can: 

-  Set  the  error  level  above  which  the  net  updates  its  weights. 

Skipping  weight  updates  for  accurate  outputs  can  speed  learning. 

-  Suppress  stdout  output  of  net  status.  Helps  in  running 
net  in  background  through  automatic  backup  of  host. 

-  If  output  of  net  represents  category  membership  (1  =  member) 
error  output  of  net  gives  error/times  category  valid. 

-  Have  the  net  "preview"  the  training  data  by  training  on  first  25%  of  the  data 
during  the  first  25%  of  the  training  epochs.  Training  data  must  be 

homogeneous, 

i.e.  the  distribution  of  outputs  classes  must  be  spread  throughout  the  data. 
INPUTS:  None 

FUNCTIONS  CALLED:  net  loopO  -  Passes  data  through  loop,  determines  error 
updateQ  -  Updates  weight  matrix 


react_pO  -  Zeros  out  p  matrix,  output  vector 
savc  weightsO  -  Saves  weights  of  network,  plus  die  outputs 
(activation  function  and  sigmoid)  of  die  net  for 
one  pass  through  the  data 
CALLED  BY:  mainO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 

void  train  netO  /*  Written  10  Jun  91,  RLL.  */ 

{ 

/*  Begin  main  loop  portion  */ 

int  numvectors; 
float  climb; 
float  minerror; 
ofp=fopen("error.dat",  V); 

fprintf(ofp,  "Total  error  and  percent  correct  per  epoch:\n"); 
fprintf(ofi>,  "Epoch\terror\t\tpercent  correct\n"); 

numvectors  =  num  vectors;  /*  Set  temp  variable  -  number  of  data  vectors  */ 
minerror  =  0.; 

J[l] =  J[0]  -  0.; 

for(a=0;a<epochs;a-H-)  { 

if(preview=  1  &&a<(float)epochs*  .25)  num_vectors  =  numvectors*. 25; 

else  num  vectors  =  numvectors;/*  If  preview  selected,  1st  25%  of  epochs  train  on 

first  25%  of  training  data.  */ 

net  loop(l);  /*  Pass  inputs  through  net,  determine  error  */ 

reset_pO;  /*  Zero  p_old[][][]  matrix  for  next  epoch.  */ 

if(verbose=T)  {  /*  If  stdout  output  desired  */ 

printf("\n%d\t%s  %f\t",a,"total  error  =",J[1]); 
printf("%%  correct  =  %5.2f\t",(float)good/(floatXnum_vectors)*  1 00); 
printf("Skipped  %5.2f  %%\n",(float)skip/(float)num_vectors*  1 00); 

} 

fprintf(o^),"%d\t%f\t%f\n",arJ[  1  ],(float)good/(float)num_vectors*  1 00); 
if(a=0) 

min  error  =  J[l];  /*  Capture  lowest  output  error  */ 

min  error  =  min_error  <  J[l]  ?  min  error :  J[l]; 
climb  =  J[l]  -  min  error; 
if(a>3) 

if(climb>0.0 1  *min_error)|climb>  1 0||fabs(J[  1  ]-J[0])<0. 000000 1 )  { 
alpha  =  alpha/10.; 
min_error  =  J[l]; 
printf("alpha  =  %f\n",  alpha); 
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} 

if  (J[l]<0.00000S||alpha<0.000000001)  {  /*If  total  error  is  less  than  an  arbitrary*/ 
save_weights();  /*  fractional  value,  then  exit*/ 

fclose(ofp); 
iftverbose=l) 

printf("Stopped  on  epoch  %d\n",a); 
exit(0); 


}  /*  End  main  loop  portion  */ 

fclose(ofp); 


save_weightsO; 

return; 


/*  Save  weights,  input  vector  z,  and  desired 
output  to  a  data  file  for  future  use.  */ 


} 


/*  end  function  train_net()  */ 


/*************************+********************************************* 

ROUTINE  NAME:  test_net() 

DESCRIPTION:  Tests  the  network  accuracy  against  the  data  in  a  test  file. 

Calls  save_testfilesO  to  save  test  data,  the  output  of  the 
net  as  it  passes  through  the  test  data,  and  the  desired  outputs 
INPUTS:  none 

FUNCTIONS  CALLED:  NetJoopO 
CALLED  BY:  mainO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffiey  S.  Dean 

***********************************************************************/ 

void  testjnetO 
( 

/*  Begin  main  loop  portion  */ 


ofp=rfopen("tstcheck.dat",  "w");  /*  Open  files  to  record  test  */ 

ef^=fopen("testdes.dat",  "w"); 
y^)=fopen("error_tst.dat",  "w"); 
ifl[cat_out=l) 

ufp=fopen( " sequence . dat " ,  "w"); 

net  loop(2);  /*  Pass  data  through  the  net,  determine  error  */ 

fclose(ofjp); 
fclose(ef^); 
fclose(yfp); 
if(cat_out=l) 
fclose(ufp); 
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if(vcrbosc~l)  { 

printf("%f  percent  corrcct\n",(float)good/(float)num_vectors*  1 00.); 
printf("File  testcheck.dat’  contains  test  dataAn"); 
printf(',Filc  testdes.dat'  contains  desired  net  output  test  dataAn"); 
printf("File  'error_tst.dat'  contains  test  error  dataAn"); 

} 

return; 


}  /*  end  function  testnetO  */ 


/♦***##♦*+#**♦******#******♦#*+**#+**#**+*#****#****#*******♦***##***#*# 


ROUTINE  NAME:  computeerrorO 

DESCRIPTION:  Computes  the  error  of  the  net  output  versus  desired  output. 
e[k]  -  error  of  output  at  this  point  in  time 
errorveefk]  -  error  for  output  k  this  iteration 
J[l]  -  Cumulative  error  on  all  outputs  this  epoch 
INPUTS:  none 

FUNCTIONS  CALLED:  check_if_good()  -  determines  whether  the  output  of  the  net 
is  close  enough  to  the  desired  output  to  be  valid 
CALLED  BY:  train_net()  and  test_net0 
LAST  UPDATED:  7  Mar  94  BY:  Jeffiey  S.  Dean 

**********************************************************************/ 

void  compute  errorO 

{ 


/*  Compute  error  at  time  t  based  on  desired  output  values.  Returns  a 
zero  error  for  t=0  on  first  epoch.  */ 

loopk(numoutputs) 
errorveefk]  =  e[k]  =  0.; 
error  =  0.; 

if  (t>=td  ||  loop_data==l) 
loopk(num_outputs)  { 
e[kj  =  d[t][k]  -  y[k*gsize]; 

I*  Calculate  error  per  output  and  overall  error  this  iteration  */ 

error_vec[k]  =  0.5  *  e[k]  *  e[k]; 
error  +=  error_vec[k]; 

} 

if(a==0&&cat_out=  1 ) 

loopk(num_outputs)  /*  If  using  categories  &  1st  epoch*/ 

out_count[k]  +=  d[t  ];  /*  tally  up  how  many  times  each  */ 

/*  category  appears  */ 


J[1J  +=  error; 
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good  +=  check_if_good(t); 
return ; 

} 

ROUTINE  NAME:  propagateO 

DESCRIPTION:  Passes  net  output  from  iteration  t-1  to  net  inputs  for 
iteration  t.  If  teacher  forcing  function  selected,  t-1 
outputs  to  net  input  replaced  with  net  desired  output  at  t-1 . 

Noise  is  added  to  the  inputs  (level  entered  in  parameters.dat) 
proportional  to  range  of  input  values. 

INPUTS:  Flag  (train)  to  determine  if  net  is  training  (=1)  or  testing  (=2) 

FUNCTIONS  CALLED:  none 
CALLED  BY:  net_loopO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 

void  propagate(train) 

/*  Computes  the  state  of  the  net  at  time  t,  and  initializes  the  z  vector  for  time  t  */ 

int  train; 

{ 

float  max,  min,  diff; 

/*  Set  previous  outputs  y[k]  as  part  of  the  next  input  z[t][k+m].  */ 

loopk(nrows) 
z[t][k+m]  =  y[k]; 


if(teacher=  1  &&train=  1 )  /*  if  teacher  forced  learning  selected,  pass  */ 

loopk(ngroups)  /*  previous  desired  net  outputs  to  net  input  */ 

z[t][m+k*gsize]  =  d[t][k]; 


loopk(nrows) 

loopi(ncols) 

s[k]  +=  w[k][i]*z[t][i];  /*  sum  weighted  inputs  */ 

return; 


/***•******************************************************************* 


ROUTINE  NAME:  compute_output() 

DESCRIPTION:  Apply  non-linear  squashing  function  (sigmoid)  to  net  output 
and  hidden  layer  nodes,  unless  linear  output  selected.  If 
selected,  net  output  nodes  receive  node  summation  function 
output. 

INPUTS:  none 
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FUNCTIONS  CALLED:  sigmoidO 
CALLED  BY:  nct  loopO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

************************************«**********************************/ 
void  compute  outputO  I*  Computes  the  output  at  time  (t+1),  ie  y(t+l).  *1 
{ 

I*  Process  each  of  the  k  nodes  as  Sigmoidal  functions  with  input  s{t] 
unless  linear  is  selected,  in  which  only  output  nodes  are  linear 
functions  of  s[t]  and  die  remaining  hidden  nodes  remain  SigmoidaL 
The  output  computed  is  yfkj  =  y(t+l)  *  f(s{t]).  */ 

loopk(nrows) 

yfk]  =  sigmoid(s[k]);  /*  Here,  y[kj=y(t+l).  */ 

if(linear=  1 )  /*  if  linear  selected,  output  is  summation  */ 

loopk(numoutputs)  /*  function  for  output  nodes  */ 

y[k*gsize]  =  s[k*gsize]; 


return ; 

} 


/ft********************************************************************* 


ROUTINE  NAME:  updateO 

DESCRIPTION:  Updates  weight  matrix.  Weights  can  have  noise  added  to  update 
to  avoid  memorizing  die  exact  data  path. 

Variable  definitions  needed  to  understand  subgrouped  RTRL: 

-  gl  is  an  offset  to  position  the  algorithm  at  the  beginning 
of  each  subgroup 

-  gsize  is  the  size  of  any  subgroup  (1  output  +  hidden  nodes) 

-  ngroups  is  the  number  of  subgroups  in  the  net  (=  #  outputs) 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  train_net() 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 


void  update() 

{ 

/*  Compute  change  of  weights  at  time  t.  delw  is  reset  to  zero  at  each 
iteration  (time  step),  and  p_old  is  p(t).  */ 


/*  weight  changes  in  subgroup  node  i  =  learning  rate*output  error*  | 
|  (change  in  net  output  g)/(changes  in  subgroup  node  output  i 
|  during  t-1)  */ 

loopg(ngroups)  /*  For  each  subgroup  *1 

loopij  (gsize^ncols)  {  /*  Change  in  weight  for  each  node  in  *1 
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/*  subgroup  between  node  ud  input  */ 

delw[g*gsize+i]0]  +=  alpha*e[g]*p_old[i][j][g*gsizc]; 

} 

/*  Update  rules.  Computes  p(t+l).  *1 
loopk(nrows) 

yprime[k]  =  y[k]*(l.-y[k]);  /*  Sigmoid  function  derivative  */ 
loopk(num_outputs)  { 

gl  =  k*gsize;  I*  gl  points  to  output  node  lc,  first  node  in  subgroup  */ 
if(linear=0)  /*  yp_min  sets  lower  limit  for  y_prime  if  output  is 

sigmoidal.  Speeds  up  training  if  sigmoid 
derivative  can  not  equal  zero.  */ 

yprime[gl]  =  yp_min<yprime[g  1  ]  ?yprime[g  1  ]  :yp_min; 
else 

yprime[gl]  =  1.;  /*  If  output  linear,  y_prime  =  1  */ 

} 

Ioopg(ngroups)  /*  For  each  subgroup  in  the  network  */ 

loopi(gsize)  I*  For  each  node  in  the  subgroup  */ 

loopj(ncols)  /*  For  each  input  into  the  network  */ 

loopk(gsize)  {  I*  loop  within  subgroup  */ 
kron  =  0.0; 

if  (i—k)  kron  =  1.0;  /*If  input  is  neuron  i's  t-1  value  */ 

/*use  input  in  p  matrix  update  */ 
gl  =  g*gsize;  /*  subgroup  offset  */ 

/*  Sum  the  product  of  the  p  matrix  within  this  subgroup  with 
the  weight  interconnects  between  the  subgroup  in  the  output 
layer  and  the  t-1  subgroup  values  in  the  net  input  layer  */ 

sum  =  0.; 
loopl(gsize) 
if(teacher!=l||l>0) 

sum  +=  w[k+g  1  ]  [g  1  +l+m]  *p_old[i]  [j  ]  [1+g  1  ] ; 

/*  Update  the  p  matrix  */ 

P[»][j][k+gl3  =  yprime[k+gl  ]*(sum+kron*z[t]Dj); 

}  /*  pHUH  **  now  for  time  p(t+l).  */ 

/*  Update  weights.  Computes  weights  for  time  w(t+l).  *1 
loopij  (nrows^icols) 
w[i][j]  +=  delw[i](j]; 
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I*  Save  partial  dcrivitivo  for  next  iteration  (time  HI)  and  react 
p  Matrix  by  swapping  the  pointers  of  the  old  p  matrix  with  the  new 
p  matrix.  */ 

ptemp  =  pold; 

p_old  =  p;  /*  p_o!d  is  now  p(t+l).  */ 

p  =  p Jemp; 

return; 

} 

/**********»*****•****************************«******************»****** 

ROUTINE  NAME:  rcset_delw_s() 

DESCRIPTION:  Resets  the  delta  weight  matrix.  Can  be  set  to  zero,  or  can 
retain  some  of  the  last  weight  changes  as  a  momentum  factor. 

Activation  outputs  for  the  output  layer  nodes  can  have  selected 
portion  retained. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  net  loop  and  saveweightsO 
LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

*************«*********************************************************/ 
void  reset_delw_sO 
{ 


/*  Reset  delta  weights  using  momentum  term  and  reset  node  sum  using  */ 
/*  keep  sum  term  for  next  calculation.  */ 

loopij  (nrows^icols)  /*  delta  weights  multiplied  by  */ 

delw[i][j]  *=  momentum;  /*  momentum  factor  */ 


loopi(nrows) 
s[i]  *=  keep  sum; 
return; 

} 


/*  Allows  use  of  a  kind  of  activation  */ 

/*  function  momentum,  or  a  neuron  */ 
/*  stimulus  that  decays  over  time  */ 


/*********************************************************************** 
ROUTINE  NAME:  reset_pO 

DESCRIPTION:  Reinitializes  old  p  matrix  and  output  layer  node  values. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 


CALLED  BY:  train_netO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 
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void  reset_pO 

{ 

/*  Zero  p.oldQQQ  for  next  calculation.  */ 
if(loop_data=0)  { 
loopf(gsize) 
loopj(ncols) 
loopk(nrows) 

P_old[g]D]M  =  0.; 

loopi(nrows) 

y[i]  “  0.; 

} 

return; 

} 

/*•****************************«**************»***«*****»*************** 

ROUTINE  NAME:  sigmoidO 

DESCRIPTION:  Provides  sigmoidal  squashing  function 
INPUTS:  single  precision  floating  point  number 
FUNCTIONS  CALLED:  none 
CALLED  BY:  compute_output() 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 

float  sigmoid(x) 
float  x; 

{ 

if  (x  >  maxval) 
return  1.0; 
if  (x  <  -maxval) 
return  0.0; 

return  1/(1  +  exp(-x)); 

}  /*  end  sigmoid  */ 

/I********************************************************************** 

ROUTINE  NAME:  init_net0 

DESCRIPTION:  Reads  net  operating  parameters  from  "parameter.dat"  file,  as 
well  as  from  the  data  file. 

INPUTS:  Flag  determining  whether  net  will  be  trained  or  tested. 

FUNCTIONS  CALLED: 

fskip  lineO-  skips  line  in  input  file 
ivectorO  -  allocates  memory  for  integer  vector 
vectorO  -  allocates  memory  for  floating  point  vector 
matrixO  -  allocates  memory  for  floating  point  matrix 
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matrix3d()  -  allocates  memory  for  3-D  fp  matrix 
ran  10  -  random  number  generator 
CALLED  BY:  mainO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

. . . 

void  init_net(  train) 
int  train; 

{ 


char  junk_response[256]; 
int  nrowsw; 

/*  Read  data  from  die  Input  file  "  parameters. dat"  */ 

i  ft  (ifp=fopen(  "parameters.dat",  "rM))=NULL) 
printf("Error  opening  parameter  file\nN); 
if((fgets(junk_response,  256,  ifp))=NULL)  { 
printfC’ Can't  get  junk  line  from  parameters  file\nH); 
exit(0); 

} 

fscanf(ifp,"%d  %f  %d",&epochs,&alpha,&seed); 
fscanf(iQ),M%f  %r,&momentum,&yp_min); 
fskip_line(ifp); 
fskip_line(ifo); 

fscanf(ifp,"%d  %d  %d",&weights,&linear,&teacher); 
fscanf(ifp,"%f  %d  %d",&skip_threshold,&cat_out,&loop_data); 
fskip_line(ifp); 
fskip_line(ifov; 

fscanf(ifp,"%d  u/«d  %r,&verbose,&max_val,&bp_factor); 

fskip_line(ifp); 

fskiplinefifjp); 

fscanf(ifp,"%f  %f  %d",  &keep_sum,  &OK_threshold,&preview); 
fclose(ifp); 

/*  Read  data  from  the  input  file  datafile  (user  specified)  *1 

ifp=fopen(datafile,  "r"); 

fscanf(ifp,"%d  %d  %d",&num_inputs,&num_outputs,&num_nodes); 

fscanf(ifp,"%d  %d",&num_vectors,&td); 

fclose(ifp); 

if(num_nodes%num_outputs!=0)  /*  Add  hidden  nodes  until  each 

subgroup  has  the  same  amount  */ 

num  nodes  =  ((intXnum  nodes/num  outputs)  + 1)  *  num  outputs; 

/*  Output  operating  parameters  to  stdout,  if  selected  */ 
if(verbose=l)  { 
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printfC Recurrent  neural  net  parameters:\nH); 
printf("Input  file\nH); 
print£("%s\nH,  datafile); 
i^train298!)  { 

printf( "  epochs\t\talpha\t\tmomentum\ty_prime_min\n " ) ; 
prints  M%d\t\t%f\t%f\t%f\n\n",epochs,alpha>momentum,yp_min); 
printf("vectors\t\tskip  threshold\tbp_factor\nM); 
printf(”%d\t\t%f\t%fai\n"^ium_vectors,skip_threshold,bp_factor); 
printf("keep_sum\tmax_val\t\tdaia_loop\n"); 

I»int^"%f\t%d\t\tM ,  keepsum,  max_val); 
if(loop_data=l)  printfCEnabled  \n\n"); 
else  printft"Disabled\n\n"); 

} 

printf("inputs\t\toutputs\t\thidden  nodes\ttime  delay\n"); 
printf(M%d\t\t%d\t\t%d\t\t%d\n\n"^iuin_inputsjiuin_outputs, 
numnodes-numoutputs,  td); 
print£(nWeights\t\tOutput\t\tCategories\tPreview\n"); 
if(weights=  1  ||train— 2)  printfCOldW); 
else  printf("New\t\t"); 
if(linear=l)  printff  Linear\t\t"); 
else  printfC  Sigmoid\t\t " ) ; 
if(cat_out==l)  printf("Yes\t\t"); 
else  print^"No\^t"); 
if(preview=l)  printf("Enabled\n\n"); 
else  printf("Disabled\n\n"); 
if(cat_out==0) 

printf(',\nOK_threshold\n%f\n",OK_threshold); 

} 

if( weights  —  1 1|  train  —  2)  { 
if((ifpHfopen("weights.dat",  "r"))=NULL)  { 
printf("Error  opening  weight  file\n"); 
exit(0); 

> 

fscanf(ifp,"%d",&nrows_w); 

fclose(ifp); 

ifTnrows_w  !=  num_nodes)  { 

printfT***  Warning!  Weights  don't  match  data  configuration!  *  *  *\n"); 
printf("***Replacing  data  configuration  to  match  weights.***\n"); 
numnodes  =  nrowsw; 

} 

} 

m  =  num  inputs  +  1 ;  /*  #  of  external  inputs  pins  bias  */ 

nrows  =  n  =  num  nodes;  /*  #  of  rows  for  weight  matrix  */ 
ncols  =  m+num_nodes;  /*  #  of  cob  for  weight  matrix  */ 
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gaze  -  num jiodea/mnn_outputa;  t*  number  of  nodes  in  a  subgroup  */ 

□groups  -  numoutputs; 

t*  Allocate  memory  for  vectors  ud  matrices  */ 

out  count^vcctorfO^irows- 1 );  /*  number  of  times  a  category  output 

is  the  supposed  to  be  output  */ 

error_vec=vector(0,nrows- 1 );  /*  output  error  for  output  node  */ 

e=vector(0,nrows- 1 );  I*  error  vector  */ 

y==vector(0,nrows- 1 );  /*  output  vector  */ 

s=tvector(0,nrows-l);  /*  sum  of  weighted  inputs  */ 

yprime=vector(04ium_nodes- 1 );  /*  dy/dw  */ 
w=matrix(0,nrows- 1 ,0,ncols-l);  /*  weight  matrix  */ 

delw=matrix(0>nrows- 1 ,0»ncols- 1 );  /*  delta  weights  *! 

z=matrix(0,niun_vectors,0^icols- 1 );  /*  input  vector  array  */ 

d=matrix(0,num_vectors,0,ncols- 1);  I*  desired  output  array  */ 

p=matrix3d(0,gsize-l  ,0,ncols- 1  ,0,nrows- 1 );  /*  dy/dw  */ 

pold=matrix3d(0,gsize- 1 ,0,ncols- 1  ,0,nrows- 1 );  /*  dy/dw  */ 

accuracy=ivector(0,num_outputs- 1 ); 


/*  Initialize  variables  to  zero  */ 
J[0]-J[1]=0.0; 

loopij(num_vectors,ncols) 
z[i][j]  *  0.; 

loopij  (numvectors,numoutputs) 

d[i][j]  - 

loopi(nrows)  { 

y[i]  =  e[i]  =  s[i]  =  error_vec[i]  =  0.; 
yprime[i]  =  ypmin; 
loopj(ncols) 
w[i][j]  =  delw[i][j]  =  0.; 

} 

loopg(gsize) 

loopj(ncols) 

loopk(nrows) 

P[g]D]M  =  P_°ld[g]  0  ]  [k]  =  0.; 

loopi(numoutputs) 
accuracy[i]  =  0; 


/*  Initialize  weight  matrix  using  pseudo-random  numbers  */ 
idum  =  -IABS(seed); 
ranl(&idum); 
loopi(nrows) 
loopj(ncols) 
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w[i]B3  *  (2*ran  1  (&idum>- 1.0); 


/*  hhhBw  lint  input  to  I  (non-external)  */ 

loopi(numvectors) 

z[i][0)«l.; 

return; 

} 

^t****»«*M»M****«******MM*MM*MMM**M****M* 

ROUTINE  NAME:  read_dataO 

DESCRIPTION:  Reads  data  file  specified  for  training  or  test. 
INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  mainO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 


void  readjdataQ 

{ 
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/*  Read  data  file  external  inputs  */ 

if((ifp=fopen(datafile,  nr"))=NULL)  { 
printf("Enor  opening  data  fiie\n"); 
exit(0); 

} 

fskip_line(ifp); 
loopi(num  vectors)  { 
loopj  (numinputs) 
fscanf(ifp,"%f’,&z[i][j+l]); 
loopj  (num  outputs)  { 
fscanf(ifp,"%f’,&d[i][j]); 
if(d[i]D]!=0&&d[i]B]!=l&&cat_out=l)  { 
printf("bad  (not  category)  training  value!  %f\n",d[i][j]); 
printf("found  on  line  %d\n",i); 
exit(0); 

} 

} 

} 

fclose(ifp); 

return; 


/ft********************************************************************** 

ROUTINE  NAME:  save_weightsQ 
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DESCRIPTION:  Saves  network  weights.  Runs  network  through  one  more  pass  on 
data,  capturing  network  outputs  and  output  node  activation 
function  values. 

INPUTS:  none 

FUNCTIONS  CALLED:  rcset_delw_s,  propagate,  computeoutput 
CALLED  BY:  trainnetO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

*«********************************************************************/ 

void  saveweightsO 

{ 

int  out,  desired; 
float  max; 

ufp=fopen(Mweights.datn,  "w"); 
fprintf(ufp,"%d\nM>nrows); 
loopj(nrows) 
fjnintf(ufp,"%f  ",y(J3); 
fprintf(ufp,',\n"); 

loopi(nrows)  /*  save  network  weights  */ 
loopj(ncols) 

fjwintf(ufp,"%  f  \n",w[i][j]); 
fclose(ufp); 

if(cat_out=l) 

ofp=fopen("sequence.dat",  "w"); 

/*  save  input/outputs  in  recnet  input  file  format  */ 

/*  to  allow  further  processing  using  net  output  data  */ 
efp=fopen("netout2.dat",  "w");  /*  Saves  activation  and  desired  outputs*/ 
fprintf(efp,"%d  %d  ",num_inputs,  numoutputs); 
fprintf(eQ),"%d  %d  %d\n",num_nodes,  num_vectors,td); 

ufp=fopen("netout.dat",  "w");  /*  Saves  net  output  versus  desired  output*/ 

fprintf(ufp,"%d  %d  ",num_inputs,  num_outputs); 
fprintf(ufp,"%d  %d  %d\n",num_nodes,  num_vectors,td); 

loopi(numoutputs) 
accuracyfi]  =  0; 

desired  =  old_des  =  out  -  old  out  =  -1; 

forit=^;t<num_vectors;t-H-)  {  /*  Loop  network  through  data  again  */ 

loopj  (num  outputs)  /*  save  output  nodes  output  */ 

fprintf(u^),"%f\t",y[j*gsize]); 

loopj  (num  outputs)  /*  save  desired  output  */ 
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if(cat_out=l)  fprintf(ufp,"%  d  M,(int)d[t][j]); 
else  Q>rintf(ufp,"%5.3f  H,d[t][j]); 

fprintf(uJ[p,',\n"); 

iftcat_out=l)  { 

max- -1000.;  /*  find  out  which  of  the  outputs  */ 

loop)  (numoutputs)  /*  has  the  highest  value  */ 

if(s[j*gsize]  >max)  { 
max  =  s[j*gsize]; 
out*j; 

} 

loopj (num  outputs)  /*  Determine  the  correct  output  *( 

if(d[t][j]  =  1.)  desired  =j; 


fprintf(ofp,"%d\t%d\n",  out,desired);  /*  Save  net  output/desired  ouput  */ 
}  I*  to  sequence.dat  file  for  scoring  */ 

loopj  (numoutputs)  I*  save  activation  function,  desired  output  */ 

ljprintf(efp,"%f  ",s[j  *gsize]); 

loopj  (numoutputs)  /*print  desired  outputs  */ 

if(cat_out=l)  fprintf(efp,"%  d  ",(int)d[t]|j]); 
else  fprintf(efp,"%5  Jf  ”,d[t][j]); 

fprintf(efp,"\nM); 

reset_delw_sO; 

propagateO;  /*  Computes  the  state  of  the  net  at  time  t 
Store  previous  outputs  y[t-l]  as  part  of 
the  new  input  vector  z[t]  ji].  Sum  all 
z[][]*w[][]  inputs  into  the  activation 
vector  s[t]  for  input  into  y[tj.  */ 

compute_output();  /*  Compute  the  output  y(t+l)=fls(t)].  */ 

} 

if(cat_out=l)  { 

fprintf(ofp,"\n\nPercent  correct  per  category:\n"); 
loopk(numoutputs) 

fj>rmtf(ofp,"%f  ",  100.*(float)accuracy[k]/out_count[k]); 
fprintf(ofp,"\n"); 

} 

fclose(ufp); 

fclose(efp); 

if(cat_out=l) 
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fclose(ofp); 


return; 

} 


z*********************************************************************** 

ROUTINE  NAME:  read  weightsO 

DESCRIPTION:  Reads  weights  for  testing  network  or  for  additional  training 
INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  mainO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

****************************************************************•******/ 

void  read_weightsO 

{ 

int  nrowsw; 

ifp=fopen("weights.dat",  "r"); 
fscanf(ifp,"%d  ",&nrows_w); 
loopj(nrows) 
fscanf(ifp,"%f 

locni(nrows)  /*  load  network  weights  */ 

loopj(ncols) 

fscanf(i§),"%f,,&w[i]U]); 

fclose(ifp); 

return; 

} 


/*********************************************************************** 


ROUTINE  NAME:  check JileO 

DESCRIPTION:  Determines  if  data  file  exists.  If  not,  program  exits. 
INPUTS:  none 

FUNCTIONS  CALLED:  none 


CALLED  BY:  mainO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 


void  check  JBleO 

{ 


ofp  =  fopen(datafile,"r"); 
if(ofp  =  NULL)  { 

printf("\n%s  %s\n",datafile,":  File  not  found."); 
exit(0); 

} 

else  fclose(ofp); 
return; 
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} 

. . ******************************************** 

ROUTINE  NAME:  save_testfilesO 
DESCRIPTION:  Saves  data  from  network  test. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  test  netO 

LAST  UPDATED:  7  Mar  94  BY:  Jeffrey  S.  Dean 


void  save_testfiles() 

{ 

int  desired,  out; 
float  max; 

/*  Output  to  testcheck.dat,  gives  inputs,  training  values 
and  outputs  of  the  net  */ 

loopj(num_outputs){ 

ifteat_out=l)  fprintf(ofp,"(%  d  :",(int)d[t][j]); 

else  fprintffafi>,"(%  f  :",d[t][j]); 

fprintf(ofp,"  %  f  ",y[j*gsize]); 

} 

if(error>OK_threshold)  fprintf(ofp,"  *****♦*"); 
fprintf(ofp,"\n"); 
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I*  Output  to  testdes.dat,  shows  training  values  */ 

loopj(numoutputs) 

if(cat_out==l)  fprintf(efp,"%  d  ",(int)d[t][j]); 
else  Q)rintf(eQ),"%  f  ",d[t][j]); 

fprintfl(efp,"\n"); 

if(t>0) 

^>rintf(yfp,"%f\t%f\n",J[l],(float)good/(float)t*100); 
if(cat_out=l)  { 

max- -1000.;  /*  find  out  which  of  the  outputs  */ 

loopj(num_outputs)  /*  has  the  highest  value  */ 

if(s[j*gsize]  >  max)  { 
max  =  s[j*gsize]; 
out=j; 

} 

loopj(num_outputs)  /*  Determine  the  correct  output  */ 
if(d[t][j]==l.)  desired  =j; 
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fprintf(ufp,  "%d\t\t%d\nH  ,out,desired) ; 


} 

return; 

} 

/*♦♦*#****************************************************************** 

ROUTINE  NAME:  check_if_goodO 

DESCRIPTION:  Determines  if  net  output  matches  desired  output.  If  outputs 
represent  membership  in  categories,  routine  first  checks  if  any 
output  category  should  be  valid. 

INPUTS:  Integer  value  representing  position  in  data  stream  (iteration) 

FUNCTIONS  CALLED:  none 
CALLED  BY:  net_loop() 

LAST  UPDATED:  7  Mar  94  BY:  Jeffiey  S.  Dean 

**************************************************•**********•********/ 

int  check_if_good(iter) 
intiter, 

{ 

int  goodjone,  out,  count; 
float  max; 

good  one  =  0;  /*  initialize  flag  */ 

if((t<td)  &&  (loopdata  ==  0)) 

++good_one; 
else  { 

if(cat_out==T)  { 

max  =  -1000.;  /*  find  out  which  of  the  outputs  */ 

loopy  (numoutputs)  /*  has  the  highest  value  */ 

if(s[j*gsize]  >max)  { 
max  =  sO*gsize]; 
out=j; 

if(out<0!|out>num_outputs- 1 )  { 
printf("out  =  %d\n"); 
exit(0); 

} 

} 

if((int)d[iter]  [out]— 1 )  {  /*  If  the  highest  value  matches  */ 

good_one++;  I*  the  desired  category,  its  good.*/ 

accuracy[out]  +=  1;  I*  Net  has  hit  in  category,  inc  count*/ 

} 

} 

else  {  /*  If  the  output  is  not  a  category  */ 

count  =  0; 


loopj  (num_outputs) 

if(error_vecO]>OK_threshold)  I*  check  if  the  error  is  low  */ 
count-H-; 
if(count=0) 
good  one++; 

} 

} 

if(good_one  >1) 
goodone  =  1; 
return  good  one; 


/*********************************************************************** 

ROUTINE  NAME:  netloopO 

DESCRIPTION:  Called  for  each  data  point,  it  computes  the  error  from  the 
last  iteration,  checks  whether  the  output  can  be  considered 
valid,  resets  the  delta  weight  matrix,  passes  the  output  and 
hidden  node  values  from  the  last  iteration  to  the  net  input 
layer,  and  computes  the  net  output  for  this  iteration. 

INPUTS:  Flag  (train)  to  determine  if  net  is  training  (=1)  or  testing  (=2) 

FUNCTIONS  CALLED: 

compute_enor()  -  determines  the  error  between  the  net 
output  from  die  last  iteration  and  the  desired  output 
reset  delw  s()  -  Resets  the  delta  weight  matrix  and  zeros 
out  the  weighted  summed  inputs  from  the  lrrt  iteration 
propagateO  -  passes  the  values  produced  by  the  top  layer 
of  the  network  (hidden  and  output  nodes)  back  to  the 
net  input  for  this  iteration 

compute_output()  -  computes  the  values  of  the  output  and 
hidden  nodes  of  the  net 

save_testfilesO  -  saves  test  data,  net  output  &  desired 
output 

CALLED  BY:  train_net(),  test_net(),  save_weightsO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 

void  net_loop(train) 
int  train; 

{ 

J[0]  -  J[l];  /*  Update  error  from  last  epoch.  */ 

J[l]  =  0.;  /*  Initialize  error  for  current  epoch.  */ 

skip  —  0;  /*  Initialize  skipped  updates  counter.  */ 

good  =  0;  /*  Initialize  #  right  answers  counter.  */ 

for(t=0;t<num_vectors;t-H-)  { 
compute  errorO;  /*  Computes  the  error  at  time  t 

How  far  off  are  the  outputs  from  the 
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desired  values?  Compute  total  error.*/ 


if(train=2) 

saveJestfilesO; 

reset_delw_sO; 

propagate(train);  /*  Computes  the  state  of  the  net  at  time  t 
Store  previous  outputs  y[t-l]  as  part  of 
the  new  input  vector  z[t][i].  Sum  all 
z[][]*w[][)  inputs  into  the  activation 
vector  s[t]  for  input  into  y[t].  */ 

compute_output() ;  /*  Compute  the  output  y(t+l)=f[s(t)J.  */ 

i^train=l)  { 

ifi[error>skip_threshold)  /*If  error  above  threshold,  update  weights  */ 
updateO;  /‘Computes  del_w(t),  and  p(t+l).  Backprop*/ 

else  skip++;  /*  error  through  net  and  perform  gradient  */ 

/*  descent  to  calculate  the  delta  weights.  */ 

if(bp_factor>0.&&t>0) 

loopij(num_outputs,ncols) 

w[i*gsize][j]  +=  alpha*e[i]  *bp_fector*yprune[i*gsize]  *z[t- 1  ]  [j] ; 

} 

} 

return; 

} 
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File  containing  Auction  declarations  and  variable 
declarations  for  the  main  program  called  netc. 

date:  30  May  91 

written  by:  Randall  L.  Lindsey 

***********************************************************************/ 

float  *vector(); 

float  ♦♦matrixO; 

float  ***matrix3d(); 

float  ranlO; 

int  *ivcctorO; 

int  **imatrixO; 


FILE  *ifp,  *ofp,  *afp,  *efp,  *ufp,  ♦yfp; 
intrun=l; 

char  str[80],  ‘datafile; 
int  ‘outcount; 

int  mows,  ncols,  g,  i,  j,  k,  1,  m,  n,num_categ,output_sel,cat_out; 

int  epochs,  a,  b,  t,  hold=5,inc,  weights,  norm,teacher,td,  verbose; 

int  numinputs,  numoutputs,  num_nod.es,  numjvectors,  dble,  reset; 

int  seed,  idum=l,out_fb,  linear,gsize,gl  ,ngroups,data _group, good, bad; 

int  loopdata,  maxval,  skip; 

float  J[2],  sum,  kronpc,yp_miiMnomentumJunk; 

float  alpha,  bp  factorjceep  sum; 

float  alphal , error, skip  threshold,* latency,  *lat_value; 

float  inputnoise,  weightnoise,  ♦error_vec; 

float  *y,  *s,  *e,  *f,  ‘yprime,  *y_won,  *mean_vect,  *  vectmax; 

float  **z,  **d,  **w,  *  *  delw,  *  *  y_old,  *  *  sumout; 

float  ***p,  ***p_old,  ***p_temp; 

int  *  accuracy; 

float  sigmoidO; 

void  initnetO; 

void  trainnetQ; 

void  testnetO; 

void  readdataO; 

void  propagateO; 

void  propagatetO; 

void  computeoutputO ; 

void  computeerrorO; 

void  updateO; 

void  reset_delw_sO; 

void  reset_p0; 
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void  initweightsO; 
void  save_weightsO; 
void  read_weights(); 
void  checkfileO; 
void  save_testfilesO; 
int  checkif jgoodQ; 
void  nctloopQ; 


/***  MACRO.H  *********************************************/ 
/♦#dcfinc  TRAIN  true;*/ 
char  junk_response[256]; 

#define  fskipline(A)  fgets(junk_response,  256,  A) 

#define  skipline  gets(junk_response) 

#define  rloopi(A)  for(i=(A)- 1  ;i>=0;i — ) 

^define  rloopj(A)  for(j=(A>  1  y>=Oy--) 

#define  rloopk(A)  for(k=(A)- 1  ;k>=0;k— ) 

#define  rloopl(A)  for(l=(A)-l  ;1>=0;1--) 

#definc  rloopij(A,B)  for(i=(A)- 1  ;i>=0;i — )  forO!KB)-ly>=Oy--) 

^define  loopg(A)  foi(g=0;g<A;g++) 

#define  loopi(A)  for(i=0;i<A;i-H-) 

#define  loopj(A)  for(j=Oy<Ay-H-) 

#define  loopk(A)  for(k=0;k<A;k++) 

#define  loopl(A)  for(l=0;l<A;l-H-) 

#define  loopij(A,B)  for(i=!0;i<A;i-H-)  forCj=Oy<By-H-) 

^define  CREATEJFILE(A,B,C)  if((A=fopen(B,"w"))  —  NULL)  { \ 
printf(strcat(C,":  cant  open  for  writing  -  %s.\n")3);  \ 
exit  (-1); } 

#define  0PEN_FILE(A3,C)  if((A=fopen(B,V))  —  NULL)  { \ 
printf(strcat(C,":  cant  open  for  reading  -  %s.\n"),B);  \ 
exit(-l);} 

^define  IABS(A)  ((intX(-(A)<(A))?((A)):HA)))) 

#define  INTMAX  (2147483647) 

/**  Dividing  by  100  insures  that  cc  and  gcc  give  same  results  **/ 

#define  IRAN1(A)  ((intXranl(A)*(float)INT_MAX)/100) 
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/************************************************************ 

NRUTELC  Numerical  utility  routines;  allocate  memory  for  vectors  and  matrices 

************************************************************/ 

^include  "malloc.h" 

#include  <stdio.h> 

void  nrcrror(crror_text) 
char  errortextQ; 

{ 

void  exitO; 

fprintf(stderr,  "Numerical  Recipes  run-time  error.. .\n"); 
Qjrintf(stdenr,"%s\n",em)r_text); 
fprintf(stderr,"...now  exiting  to  system... \n"); 
exit(l); 


float  *vector(nl,nh) 
int  nlnh; 

{ 

float  *v; 

v=(float  *)malloc((unsigned)  (nh-nl+ 1 )  *  sizeofl float)) ; 
if  (!v)  nrerror("allocation  failure  in  vectorO"); 
return  v-nl; 


int  *ivector(nl,nh) 
int  nl,nh; 

{ 

int  *v; 


v=(int  *  )malloc((unsigned)  (nh-nl+ 1  )*  sizeof(int)); 
if  (!v)  nrerror("allocation  failure  in  ivectorO"); 
return  v-nl; 


double  *dvector(nl,nh) 
int  nl,nh; 

{ 

double  *v; 


v=(double  *  )malloc((unsigned)  (nh-nl+ 1  )*sizeof(double)); 
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} 


if  (Iv)  men-orC1  allocation  Mute  in  dvcctorO"); 
return  v-nl; 


float  *  *matrix(nrl,nriMicl,nch) 
int  nrl,nrh4icl,nch; 

{ 

inti; 

float  **m; 


m=(float  **)  malloc((unsigned)  (nrh-nrl+l)*sizeof(  float*)); 
if  (!m)  nrenror("aUocation  Mure  1  in  matrixO"); 
m  -=  nrl; 


for(iasnri;i<:=nrfi;i-H-)  { 

m[i]=(float  *)  malloc((unsigned)  (nch-ncl+ 1  )*  sizeof(float)); 
if  (!m[i])  nrerror("allocation  Mure  2  in  matrixO"); 
m[i]  *=  ncl; 

} 

return  m; 

} 

float  *  *  *matrix3d(nrl^irh^icl^ich4idl,ndh) 
int  nrl,nrh,nel,nclMidl,ndh; 

{ 

intij; 
float  ***m; 


m=(float  ***)  malloc(  (unsigned)  (nih-nrl+ 1  )*  sizeof(float*  *)); 
if  (!m)  nrerror("allocation  Mure  1  in  matrix3dOn); 
m  -=  nrl; 


fot(i=nrl;i<=nih;i++)  { 

mpj^float  **)  malloc((unsigned)  (nch-ncl+ 1  )*sizeof(float*)); 
if  (!m[i])  nrerror("allocation  failure  2  in  matrix3d0"); 
m[i]  -=  ncl; 

foiOawly<=Twhy++)  { 

mplDMAoat  *)  malloc((unsigned)  (ndh-ndl+ 1  )*sizeof(float)); 
if  (!m[i][j])  nrerror("allocation  Mure  3  in  matrix3d()"); 
m[i][j]  ■=  ndl; 

} 

} 

return  m; 

> 
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double  ♦*dmatrix(nri4irh,ncl,nch) 
int  nrl,nrh,ncl,nch; 

{ 

inti; 

double  **m; 

m=(double  **)  malloc((unsigned)  (nrh-nr  1+ 1  )*  sizeof(double* )); 
if  (!m)  nrerrorf’ allocation  failure  1  in  dmatrixO"); 
m  -=  nrl; 

for(i=nrl;i<=nrh;i-H-)  { 

m[i]=(double  *)  malloc((vmsigned)  (nch-ncl+1  )*sizeof(double)); 
if  (!m[i])  nrerror("allocation  failure  2  in  dmatrixO”); 
m[i]  -=  ncl; 

} 

return  m; 


int  *  *imatrix(nrl,nrh,ncl,nch) 
int  nrl,nrtMKl,nch; 

{ 

int  i,**m; 

m=(int  **)malloc((unsigned)  (nrh-nrl+l)*sizeof(int*)); 
if  (!m)  nrerrorC’allocation  failure  1  in  imatrixO”); 
m  -=nrl; 


for(i==nrl;i<=:=nrh;i++)  { 

m[i]=(int  *)maUoc((unsigned)  (nch-ncl+1  )*sizeof(int)); 
if  (!m[i])  nrerror("allocation  failure  2  in  imatrixO"); 
m[i]  -=  ncl; 

} 

return  m; 


float  **submatrix(a,oldrl,oldrh,oldcl,oldch4iewrl^iewcl) 
float  **a; 

int  oldrl,oldrh,oldcl,oldclMiewrl,newcl; 

{ 

int  ij; 
float  **m; 


m=(float  **)  malloc((unsigned)  (oldrii-oldrl+1  )*sizeof(float*)); 
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if  (!m)  nrerror("allocation  failure  in  submatrixO"); 
m  -=  newrl; 

f<»(i*oldrijs0Bewrl;i<=!oldrh;i-H-j-H-)mlj]=sa[i]+oldcl-newcl; 
return  m; 

} 


void  firee_vector(v,nl,nh) 
float  *v; 
int  nl,nh; 

{ 

free((char*)  (v+nl)); 

} 

void  free_ivector(v,nl,nh) 
int  *v,nl,nh; 

{ 

fiee((char*)  (v+nl)); 

} 

void  free_dvector(v,nl,nh) 
double  *v; 
int  nl^nh; 

{ 

free((char*)  (v+nl)); 

} 

void  free_matrix(nMU‘l,nrh,ncl>nch) 

float  **m; 

int  nrl>nrii>ncl,nch; 

{ 

inti; 

for(i=nrti;i>=nrl  ;i--)  free((char*)  (m[i]+ncl)); 
free((char*)  (m+nrl)); 

> 

void  free_dmatrix(m^irl4irii^cl,nch) 

double  **m; 

int  nrijniuKl,nch; 

{ 

inti; 
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fbr(i*nrh;i>mnri;i-)  free((char*)  (m[i]+ncl»; 
frcc((char*)  (m+nrl)); 


void  fircc_iinatrix(rn^lvnrh>xKl^ch) 
int**m; 

int  nrl^riMicl^ich; 

{ 

inti; 

for(i=iirfi;i>=nrl;i-)  free((char*)  (m[i]+ncl)); 
frec((char*)  (m+nrl)); 


void  frce_submatrix(b^l4irh,ncl,nch) 

float  **b; 

int  nrl^iriMicl,nch; 

{ 

fiee((cliar*)  (b+nrl)); 

} 

float  *  *convert  matrix(a^irl^iriMid^ch) 
float  *a; 

int  nrUnigicl,nch; 

{ 

int  ij,nrow,ncol; 
float  **m; 

nrow=nrh-nrl+l; 

ncol=nch-ncl+l; 

m  =  (float  ♦♦)  malloc((unsigned)  (nrow)*sizeof(float*)); 
if  (!m)  nrerror("allocation  failure  in  convert_matrixO"); 
m  -=  nrl; 

for(i=0  j=mi;i<=nrow-l  ;i++ j++)  m[j]=a+ncol*i-ncl; 
return  m; 


void  ftee_conveTt_matrix(b^irl^irlvicl>nch) 

float  **b; 

int  nritnrh,ncl,nch; 

{ 

fiee((char*)  (b+nrl)); 

} 
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1.2  **********************/ 

/* 

Constants  defining  mallopt  operations 

*/ 

#define  M_MXFAST  1  /*  set  size  of  blocks  to  be  fast  */ 

^define  M  NLBLKS  2  /*  set  number  of  block  in  a  holding  block  */ 

#define  M  GRAIN  3  /*  set  number  of  sizes  mapped  to  one,  for 

small  blocks  */ 

#define  M  KEEP  4  /*  retain  contents  of  block  after  a  free  until 

another  allocation  */ 

I* 


structure  filled  by 

*/ 

struct  mallinfo  { 

int  arena;  /*  total  space  in  arena  */ 
int  ordblks;  /*  number  of  ordinary  blocks  */ 
int  smblks;  /*  number  of  small  blocks  */ 
int  hblks;  /*  number  of  holding  blocks  */ 

int  hblkhd;  /*  space  in  holding  block  headers  */ 
int  usmblks;  /*  space  in  small  blocks  in  use  */ 
int  fsmblks;  /*  space  in  free  small  blocks  */ 
int  uordblks;  /*  space  in  ordinary  blocks  in  use  ♦/ 
int  foidblks;  /*  space  in  free  ordinary  blocks  */ 
int  keepcost;  I*  cost  of  enabling  keep  option  */ 

}; 

char  *malloc(); 

void  freeO; 

char  *reallocO; 

int  malloptO; 

struct  mallinfo  mallinfoO; 


/******************************************************* 


RAN1.C  -  Numerical  recipes  pseudo-random  number  generator 


#defineMl  259200 
#defineIAl  7141 
#defineICl  54773 
#defineRMl  (1.0/Ml) 
#define  M2  134456 
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#define  IA2  8121 
#defineIC2  28411 
#define  RM2  (1.0/M2) 
^define  M3  243000 
#dcfine  IA3  4561 
#defineIC3  S1349 


extern  float  ranl(idum) 
int  *idum; 

{ 

static  long  ixl,ix2,ix3; 
static  float  r[98]; 
float  temp; 
static  int  iff=0; 
int  j; 

void  nrenorO; 

if  (*idum  <  0 1|  iff  —  0)  { 
iff==l; 

ixl=(ICl-(*idvim))  %  Ml; 
ixl=(IAl*ixl+ICl)  %  Ml; 
ix2=ixl  %  M2; 
ixl=(IAl*ixl+ICl)  %  Ml; 
ix3=ixl  %  M3; 
for  (j=lij<=97y++)  { 

ixl=(IAl*ixl+ICl)  %  Ml; 
ix2=(IA2*ix2+IC2)  %  M2; 
r[j]=(ixl+ix2*RM2)*RMl ; 

} 

*idum=l; 

} 

ixl=(IAl*ixl+ICl)%Ml; 
ix2=(IA2*ix2+IC2)  %  M2; 
ix3=(IA3*ix3+IC3)  %  M3; 
j=l  +  ((97*ix3)/M3); 

if  (j  >  97 1|  j  <  1)  nrerror("RANl :  This  cannot  happen."); 
temp=r[j]; 

r[j]=(ixl+ix2*RM2)*RMl; 
return  temp; 

} 


#undef  Ml 
#undef  LAI 
#undefICl 
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#undef  RM1 
#undef  M2 
#undefIA2 
#undefIC2 
#undefRM2 
#undcf  M3 
#undcf  IA3 
#undefIC3 


/*****#****************#***#*******************+*************** 

MAKEFILE 

*#***************************************«****************/ 

CFLAGS  =  -02  -lm 

recnet :  recnetc  nrutil.o  rani  .o 

cc  -o  recnet  recnet.c  nrutil.o  ranl.o  $(CFLAGS) 

nrutil.o :  nrutil.o 

cc  -02  -c  nrutil.c 

ranl.o  :ranl.c 

cc  -02  -c  ranl.c 

clean: 

rm-f*.o  net  recnet 


The  following  listing  is  from  the  "parameters.dat"  file,  used  to  define  the  working 
parameters  under  which  the  recurrent  net  is  operating 

epochs  alpha  seed  moment  y_pr  min 

1000  0.01  152367  0.0  0.01 

weights  linear  teacher  skip  cat  loop  data 

1  0  0  0.0000  1  0 

verbose  max_val  bpfactor 

1  50  0.50 

keep  sum  OK  threshold  preview 
0.000  0.125  0 
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epochs  *  number  of  times  net  trains  on  data  file 
alpha  *  teaming  constant 
seed  *  ran  lorn  number  seed 
moment  =  momentum  term 

y_pr  min  =  minimum  value  allowed  for  sigmoidal  derivative  function  f(l-f) 

weights  =  0:  generate  new  weights  for  this  training  session 

1 :  used  the  weights  in  "weights.dat"  to  continue  training 

linear  *  0:  output  nodes  use  sigmoidal  output 

1 :  output  nodes  use  linear  output 

teacher  -  0:  do  not  use  teacher  forced  training 

1:  use  teacher  forced  training 

skip  =  error  threshold  above  which  weights  are  updated 

cat  =  0:  outputs  of  net  do  not  represent  categories 

1 :  outputs  of  net  represent  categories  (i.e.  are  1  or  0) 

loopdata  =  0:  zero  out  outputs  after  end  of  epoch 

1 :  Do  not  zero  out  outputs.  Allows  continuity  of  data 
passing  through  the  net  between  epochs 

verbose  =  0:  Do  not  print  messages  to  stdout  (screen) 

1 :  Print  messages  to  stdout  (screen) 

maxval  =  limit  of  activation  value.  Above  maxval,  the  sigmoid  function 
returns  1 ;  below  -max  val,  the  sigmoid  function  returns  0. 

bp  factor  =  Gives  net  capability  to  update  weights  by  means  of  standard  backprop 
algorithm,  in  addition  to  RTRL.  Factor  determines  how  much  emphasis 
given  to  backprop  weight  updates.  Usual  range  between  0  and  1 . 
Backprop  only  used  on  weights  to  output  nodes. 

keepjsum  =  Provides  a  momentum  term  for  the  activation  values  of  the  neurons. 

preview  =  0:  Net  trains  on  all  the  training  data,  each  epoch 

1:  For  1st  25%  of  epochs,  net  trains  on  1st  25%  of  training  data. 
Remaining  75%  of  epochs  training  occurs  with  all  training  data. 
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Appendix  C:  Source  Code  for  Creation/Manipulation  of  Data 


This  appendix  contains  listings  of  the  source  code  for  the  program  used  to 
generate/  modify  the  data  used  to  train  or  test  the  subgrouped  recurrent  network,  or  to 
evaluate  network  accuracy  based  on  net  outputs.  Code  was  added  as  need  occured,  so  no 
claim  is  made  as  to  program  efficiency  or  organization. 

/ft********************************************************************** 

CREATE.C 

A  tool  to  allow  manipulation  of  the  data  files  used  to  train 
and  test  recnet. 


date:  7  May  93 


written  by:  Jeffrey  S.  Dean 


***********************************************************************/ 


#define  Ml  259200 
#define  IA1  7141 
//define  IC1  54773 
#define  RM1  (1.0/Ml) 
#defme  M2  134456 
#defme  IA2  8121 
#define  IC2  28411 
#define  RM2  (1.0/M2) 
//define  M3  243000 
//define  IA3  4561 
#define  IC3  51349 


//include  <stdio.h> 

//include  "macros.h" 

//include  <math.h> 

//include  "def.h" 

//include  <string.h> 

f*********************************************************************** 

ROUTINE  NAME:  main 

DESCRIPTION:  Prompts  the  user  whether  he  wants  to  create  a  file  to  train 
or  test  the  net  on  a  Butterworth  filter  response,  or  to  load 
and  mainipulate  a  data  file. 

INPUTS:  default  inputs  argc  and  argv,  not  used 
FUNCTIONS  CALLED: 

ButterworthO  -  Prompts  user  to  select  type  of  Butterworth  filter  data 
File  workO  -  Prompts  user  for  file  name  to  be  loaded,  then  for  function 
to  be  performed 
CALLED  BY:  none 
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LAST  UPDATED:  7  May  93 


BY:  Jeffrey  S.  Dean 


7 


main(argc,  argv) 
intargc; 
char  *argvQ; 

{ 

o  *  matrix(0,25000,0,64); 
v  *  matrix(0^5000,0,50); 
pick=matrix(0^ 5000,0,1); 
pntr=%natrix(0, 1500,0,1); 
o2  *  matrix(0,25000,0,6); 
v2  *  matrix(0,25000,0,28); 
numvectors  =  numvectors  =  0.; 

SelectO; 

exit(0); 

} 

void  SelectO 

{ 

int  choice; 
for  (;;)  { 

printf("Choose  one  of  the  following:  \n"); 

printf("\nl .  Create  a  Butterworth  filter  response  file  \n"); 

printf("\n2.  Load  and  modify  an  existing  file  \n”); 

printf("\n3.  Create  a  XOR  data  file  \n"); 

printfi"\n4.  Manipulate  sequence  identification  files  \n"); 

scanfl["%d",  &choice); 

printfi"\nM); 

if(choice=l)  ButterworthO; 
if(choice=2)  { 

AppendQ; 

FUe_woik0; 

} 

if[choice==!3)  XorO; 
if(choice=4)  SequenceO; 
printfl^That  is  not  a  valid  choice\n\n\n"); 

} 

} 

z***************************************************************** 

ROUTINE  NAME:  ButterworthO 

DESCRIPTION:  Prompts  user  to  select  between  cosine,  step,  random  or  impulse 
functions  for  building  a  Butterworth  filter  data  file  for 
recnet 
INPUTS:  none 
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FUNCTIONS  CALLED: 

CosinO  *  creates  128  point  cosine  wave  values,  with  Butterworth  filter 
response  as  training  values 

StepO  -  creates  a  step  function  (0  to  1)  input  file,  with  Butterworth 
filter  response  as  training  values 

RandomO  -  Creates  a  699  point  random  number  string  (0  to  1  values),  with 
Butterworth  filter  response  as  training  values 
ImpulseO  -  creates  a  200  point  series  of  impulses,  with  Butterworth  filter 
values  as  training  data 
CALLED  BY:  mainO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

****«******************************************************************/ 

void  ButterworthO 

{ 

int  choice; 
for  (;;)  { 

printfOnDo  you  want  to:\n"); 

printflfl.  Generate  a  cosine  function  training  file7\nn); 

printf("2.  Generate  a  stop  function  training  file?\nN); 

printf("3.  Generate  a  random  function  training  file7\nH); 

printf("4.  Generate  an  impulse  function  training  file7\n"); 

printfC’5.  Exit\n"); 

skipline; 

scanf("%d",  &choice); 

printf("\n"); 

if(choice=l)  CosinO; 

if(choice=2)  StepO; 

if(choice=3)  RandomO; 

if(choice==4)  ImpulseO; 

if(choice=5)  exit(0); 

printfCThat  is  not  a  valid  choice\n\n\n"); 

} 

} 


/*************************************************************** 
ROUTINE  NAME:  File_work0 

DESCRIPTION:  Allows  the  user  to  select  from  multiple  data  file  manipulation 
options 
INPUTS:  none 
FUNCTIONS  CALLED: 

AppendQ  -  Appends  another  data  file  to  the  data  in  memory.  Number  of  file 
inputs  and  outputs  must  match  data  format  in  memory 
SaveO  -  Saves  the  current  form  of  data  as  a  file 
MergeO  -  Allows  the  user  to  replace  the  inputs  or  outputs  of  the  data  in 
memory  with  those  in  a  data  file.  Number  of  file  data  vectors  must 
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match  with  the  nwnber  of  data  vectors  in  memory. 

Time_ddayO  •  shifts  the  outputs  ahead  in  time 
CstegorieaO  -  Prompts  the  user  to  select  an  output  (integer)  and  expands 
the  output  into  category  format  (1 »  member,  0  *  nonmember) 

CullO  -  Extracts  the  data  vectors  in  die  data  file  that  belong  to  one  of 
die  possible  categories 
NormO  -  Normalizes  the  inputs 
ClearO  -  Reinitializes  inputs  and  outputs 

One_cat()  -  Specifically  for  phoneme  group  extraction.  Performs  one  of  two 
functions:  Expands  outputs  to  all  categories  in  a  phoneme 
group  (nasal,  vowel,  etc.),  with  one  category  for  non-members; 
or  converts  output  to  two  membership  functions,  either  in  group 
or  not  in  group. 

DiflerentiateO  -  Differentiates  inputs  across  each  vector  and  between 
vectors. 

StatusO  -  displays  number  of  data  vectors,  number  of  inputs/outputs,  and 
the  time  delay  in  the  outputs. 

Out-typesO  -  Displays  how  many  of  the  potential  categories  are  present  in 
die  data 

ViewO  -  Allows  user  to  display  current  inputs  and  outputs 
PhonemeO  -  Shows  user  which  phonemes  of  each  of  the  phoneme  types  are 
present  in  the  data 

CompareO  -  If  user  mages  outputs  of  file  used  to  train/test  net  with 
outputs  net  produced,  the  routine  checks  to  see  if  the  net 
provided  die  right  answer,  broken  down  across  output  categories 
CALLED  BY:  mainO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

•••«********•***«*******•***************»***********•******************, 

void  FileworicO 

{ 

int  select; 

printf("\nDo  you  want  to:\n"); 

printfCFILE  TRANSFER  FUNCTIONS  -\n"); 

printf("l.  Append  file\t2.  Save  file\t3.  Merge  file\n"); 

printf("\nDATA  MANIPULATION  FUNCTIONS  -\n"); 

printf("4.  Add  time  delay\t\t5.  Expand  in/outputs\t6.  Cull  outputs'^"); 

printf("7.  Normalize  inputs7\t\t8.  Clear  data\t\t9.  Select  categoryNn"); 

printf("10.  Differentiate  inputs\n"); 

printf("\nDATA  VIEWING  FUNCTIONS  -\n"); 

printfl["l  1.  Check  status\t\tl2.  Check  outputs\tl3.  View  data\ntt); 

printf("14.  Show  phoneme  breakdown^ IS.  Compare  inputs/outputsW); 

printf("\nl6.  Exit\n"); 

skipline; 

scanf("%d",  &select); 
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printfCV); 

if(select““l)  AppcndO; 
i^selecta-i=2)  SaveO; 
iftaelect'— 3)  MergeO; 
if^select*^)  Timc_delayO; 
if(select=5)  Categories^); 
i^selccts*»6)  CuUO; 
if(select“7)  NormO; 
if(select=8)  CleaiO; 
if(sckct™=9)  One_catO; 
if(select“»10)  DifferentiateO; 
if(sdcct=ll)  StatusO; 
if(8clect=12)  OutJypesO; 
ifl[sclcct»=“13)  VicwO; 
iflselect=14)  PhonemeO; 
if(select=15)  CompareO; 
if(select=16)  exit(0); 

> 

} 


/***«*****************»•*******************»***»***************»******** 
ROUTINE  NAME:  AppcndO 

DESCRIPTION:  Prompts  user  for  another  file  name,  to  append  to  the  data 
already  in  memory .  Will  not  load  file  if  the  number  of 
inputs  or  outputs  in  die  file  do  not  match  the  data  in  memory. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File  wotkQ 

LAST  UPDATED:  7  May  93  BY:  Jeffiey  S.  Dean 

«********************************************************************•*/ 
void  AppcndO 
{ 

char  choice; 

int  numinputs,  numoutputs; 

printfTVnWhat  is  die  name  of  the  file? :"); 
slap  line; 

scanf("%s",  datafile); 
printfCV); 

printfC’Reading  %s . . .  V,datafile); 
if^open(datafile,nr”); 

fecanf(ifp,"%d  %d  %d"  ,&num_inputs,&num_outputs,&num vectors) ; 
printfCThis  file  has  %d  inputs,  %d  outputs, M  ,num_inputs,nuni_outputs); 
printfCand  %d  vectors.\n",numvectors); 
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iffoumjnpuia !-  numinputsA&numinputs!*  0)  { 
printfC****  NOT  CONTINUE!  f********”\n"); 

printf("number  of  inputs  not  the  same  as  before\n"); 
return; 

} 

if(num_outputs  !*=  numoutputs&&numoutputs !  =  0)  { 
prin!f("***********CAN  NOT  CONTINUE!  !***********\n"); 
printf("number  of  outputs  not  the  same  as  before\n"); 
return; 

} 

printf("Continue?  (y/n)"); 
skipline; 

samf("%c",  Achoice); 
printfCNn"); 
iftchoice=y){ 
numinputs  =  numinputs; 
numoutputs  =  numoutputs; 
fslrip_line(ifp); 

printf("loading  data  file  ...\n"); 
loopi(numvectors)  { 
loopj(numinputs) 

fscanf(ifp,,,%r,&v[i+num_vectors][j]); 

loopj(numoutputs) 

fscanf(ifp,"%f,,&o[i+num_vectors][j]); 

} 

numvectors  +=  numvectors; 

} 

fclose(ifp); 

return; 

} 

/*e********************************************************************' 

ROUTINE  NAME:  Time_deIayO 

DESCRIPTION:  Shifts  output  values  in  data  a  user  selected  number  of 
data  vectors. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File.workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

***************************************•**•***********************•***/ 

void  Time_delayO 

{ 

float  tail[5][50]; 
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prmtfCHow  many  ticks  do  you  want  to  delay  the  outputTViT); 

sldp_line; 

scanfC*%d",  &td); 

printin'*); 

printf("numvec=%d,  numout=%d,  td=%d\n",num  vectors,  numoutputs.td); 
if(tdX))  { 
loopi(td) 

loopj(numoutputs) 
tail[i][j]  “  o[num_vectors+i-td][j]; 
loopi(numvectors-td) 
loopj(numoutputs) 

o[num_vectors-i- 1  ][j]  =  o[num_vectors-i-l-td][j]; 
loopi(td) 

loopj(numoutputs) 
o[i]03 = taii[i]D]; 

TD-H=td; 

} 

return; 

} 

/*********************************************************************** 

ROUTINE  NAME:  CategoriesO 

DESCRIPTION:  Selects  one  of  integer  outputs,  asks  for  the  range  of  values 
represented  by  die  output  (how  many  potential  categories)  and 
expands  the  output  value  to  a  string  of  Is  and  Os,  with  1 
representing  membership  in  a  category. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  file_work() 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

•**********•*************'********************************»**•»********/ 

void  CategoriesO 

{ 

int  index,  io,  max,  rep; 
printflfl.  Inputs\n2.  Outputs'^"); 
skip  line; 
scanfC%d",&io); 
printf("\n"); 

printf(HHow  many  categories  does  this  break  down  to?\n"); 

skip_line; 

scanfC,%d”,  &cat); 

printfC\n"); 

ifl[io»*2)  { 

printffWhich  output  do  you  want?  (1  -  %d)\n",numoutputs); 
skipline; 
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scanffHd",  Asel); 

priatfnu"); 

loopi(numvectors)  { 
index  *  o[i][sel-l]; 
loopj(cat) 
o[i]D]*0.; 
if(index>=0) 
o[i][(int)index]  =  1.; 

} 

numoutputs  =  cat; 

} 

else  ifl[io=l)  { 

printf("l.  Binary  representation^ .  Fully  expanded\n"); 
sldpline; 
scanf("%d",  &rep); 
printfCW); 
if(rep==2)  { 
loopi(num  vectors)  { 
index  =  v[i][0]+l; 
loopj(cat) 
v[i]Dl  =  0.; 
if(index>=0) 
v[i][(int)index]  - 1.; 
else  v[i][0]  =  1.; 

} 

numinputs  =  cat; 

} 

else  i£(rep=l)  { 
loopi(num  vectors)  { 
max=64; 

Index  =  v[i][0]+2; 
loopj(7) 
v[i]D]  =  0.; 
loopj(7){ 
if(index>=max)  { 
index-2  max; 

v[i][j]  =  i; 

} 

max  =  max/2; 

} 

} 

numinputs  =  7; 

> 

} 

return; 
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} 

/*************************************************************** 
ROUTINE  NAME:  CullO 

DESCRIPTION:  Prompts  user  to  select  one  of  integer  outputs,  and  asks  user  to 
select  one  of  potential  categories  represented  by  this  output. 

Culls  out  those  data  vectors  that  do  not  belong  to  this  category. 

Allows  user  to  include  those  non-selected  category  vectors 
just  before  and  immediately  after  the  data  vectors  selected. 

This  routine  allows  user  to  extract  only  vowels  or  a  specific 
phoneme  from  a  voice  data  file. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 


void  CullO 


7 


int  out,  count,  count2; 
char  chooselead,  all; 

printf("Which  output  do  you  want  (l-%d)?  ",numoutputs); 

skipline; 

scanf("%d",  &out); 

printf("\n"); 

printf("Which  category  do  you  want  to  extract?  "); 

skipline; 

scanf("%d",  &cat); 

printfl("\n"); 

printfTDo  you  want  the  vector  before  the  desired  category ?\n"); 
printf("(This  will  provide  a  lead  in  to  the  desired  section)\nn); 
skipline; 

scanf("%c",  &choose_lead); 
printfCW); 

printf("Do  you  want  all  tire  vectors?  "); 

skipline; 

scanf("%c",  &all); 

printfCV*); 

loopi(numvectors) 

Pick[i][0]  =  0; 
if(all  =*  y) 
loopi(numvectors) 
pick[i][0]  =  1; 
loopi(numvectors) 
if(o[i][out-l]==cat) 
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l**P][0)-l; 
if(choose Jead  —  Y) 
loofH(num_vectors)  { 
if(i>  1  &Ai<num_vcctors-2)  { 
if(o[i+ 1  ][out- 1  ]— catj|o[t+2][out- 1  ]“=cat) 
pick[i][0]  •  1; 

if(o(i-l][out-l  J— cat|(o[i-2][out-l  ]=cat) 
pick[i][0]-l; 

} 

> 

count  “0; 
count2  *  0; 
loopi(num  vectors) 
if(P*ck[i][0]  —  1)  { 
loop)  (numinputs) 
v[count][j]  =  v[i]0]; 
if(o[i][out-l]a™cat){ 
o[count][0]  =  1.; 
o[count][l]  =  0.; 
count2++; 

} 

else  { 

o[count][0]  =  0.; 
o[count][l]  =  1.; 

} 

count  +=  1; 

} 

num  vectors  =  count  - 1; 
numoutputs  =  2; 

printf(  "Number  of  vectors  =  %d,",num_vectors); 
printfl("  number  of  desired  categories  =  %d,",count2); 


return; 

} 

/*•********************************************************************* 


ROUTINE  NAME:  NormO 

DESCRIPTION:  Determines  max  and  min  of  each  data  vector,  subtracts  the 
average  of  the  max  and  min  values  to  center  data  on  zero. 

Divides  each  input  by  half  the  range  of  input  values  to  size 
die  values  between  -1  and  1. 

INPUTS:  none 


FUNCTIONS  CALLED:  none 
CALLED  BY:  File  workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 


void  NormO 


no 


{ 

float  min,  max; 


loopi(num  vectors)  { 
min  *  100000.; 
loopj(numinputs) 
if(min>v[i][j])  min  =  v[i][j]; 
max  *  0.; 
loopj(numinputs) 
if(max<v[i][j])  max  =  v[i][j]; 
loopj(numinputs) 
v[i][j]  -=  (max+min)/2; 
loopj(numinputs) 
v[i][j]  /=  (max-min)/2; 

} 

return; 

} 

/**********************************************************************, 

ROUTINE  NAME:  SaveO 

DESCRIPTION:  Saves  the  data  as  a  file 

INPUTS:  none 

FUNCTIONS  CALLED:  none 

CALLED  BY:  File_woik0 

LAST  UPDATED:  7  May  93  BY:  Jeffiey  S.  Dean 

***********************************#*********************************•/ 

void  SaveO 

{ 


char  integerout,  integerin; 

printf("What  do  you  call  the  output  file?  "); 

skipline; 

scariT%s",outfile); 

printf("How  many  hidden  nodes  do  you  want?  "); 
skipline; 

scanf("%d",  &numnodes); 

prints"  Are  the  outputs  integers?  (y/n)  "); 

skip_line; 

scanf("%c",&integer_out); 

printfC’Nn"); 

prmtfl["Are  the  inputs  integers?  (y/n)  "); 
skipline; 

scanf(Toc",&mteger_in); 
printf("\n"); 
ofpHbpen(oiitfile,"w"); 
printfCAnSaving  data ....  \n">; 


ill 


fprintf(ofp,’’%d  %d  %d  "^luminputs^iumoutputs^umnodes+numoutputs); 
fprintf(of^,"%d  %d\n"  .numvectors,  TD); 
loopi(nuzn  vectors)  { 
if^intcger_in==y) 
loopi(numinputs) 
fprintf(o^%d  ",(int)v[i][j]); 
else 

loopj(numinputs) 
fprintf(ofp,"%12.  lOf  ",v[i][j1); 
if(integer_out=y) 
loopj  (numoutputs) 
fjprintf(ofp,"%d  ",(int)o[i][j]); 
else 

1  'opj  (numoutputs) 
fprintf(ofp,"%12.10f",o[i][j]); 
fprintf(ofp,"\n"); 

} 

fclose(ofp); 

return; 

} 

/*********************************************************************** 

ROUTINE  NAME:  ViewO 

DESCRIPTION:  Prints  current  values  of  inputs  and  outputs  to  screen 
INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_work() 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

********************************************•**************************/ 

void  ViewO 

{ 

char  cont; 
int  count  =  0; 

cont  =  NULL; 
loopi(num  vectors)  { 
loopj(numinputs) 
prmtfl["%4.2f",v[i][jj); 
printfl["\n"); 
loopj  (numoutputs) 

printr%4.2f",o[i][j]); 

printf("\n"); 
count  +=  1; 
if(count  >10)  { 

printfC1  Press  <retum>  to  continue,  q  to  quit\n"); 
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skipline; 

scanfC%c",  &cont); 
printfT’ta"); 
count  =  0; 

} 

iflcont  =  'q')  break; 

} 

return; 

} 

/************•********************************************************** 

ROUTINE  NAME:  ranlO 

DESCRIPTION:  Random  number  generator 

INPUTS:  integer  pointer  for  random  number  seed 

FUNCTIONS  CALLED:  none 

CALLED  BY:  File_wotk0 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

********•****»**********************************„*****»*********„*,„/ 
float  rani  (idum) 
int  *idum; 

{ 

static  long  ixl,ix2,ix3; 
static  float  r[98]; 
float  temp; 
static  int  ifiH); 

fotj; 

void  nrenorO; 

if  (*idum  <  0 1|  iff  —  0)  { 
ifW; 

ixl=(ICl-(*idum))  %  Ml; 
ixl=(IAl*ixl+ICl)  %  Ml; 
ix2=ixl  %  M2; 
ixl*(IAl*ixl+ICl)  %  Ml; 
ix3=ixl  %  M3; 
for  (j— 1  y<=97y++)  { 

ixl»(IAl*ixl+ICl)%Ml; 
ix2=(IA2*ix2+IC2)  %  M2; 
rjj]=(ixl+ix2*RM2)*RMl ; 

> 

*idum=l; 

} 

ixl=(IAl*ixl+ICl)  %  Ml; 
ix2=(IA2*ix2+IC2)  %  M2; 
ix3=(IA3*ix3+IC3)  %  M3; 


M3 


j-1  +  ((97*ix3)/M3); 

if  (j  >  97  |j  j  <  1)  printfl:H%s\nVRANl:  This  cannot  happen."); 

temprjj]; 

rOMixl+ix2*RM2)*RMl; 
return  temp; 

} 

ROUTINE  NAME:  RandomO 

DESCRIPTION:  Generates  a  699  point  random  number  sequence,  with  the  response 
of  a  Butterworth  filter  associated  with  each  point. 

INPUTS:  none 

FUNCTIONS  CALLED:  ranl(),  SaveO 
CALLED  BY:  ButterworthO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

I************************************************************'*********/ 

void  RandomO 

{ 

float  class,aO,al  ,a2,b0,bl ; 
int  idum=l,ij,bubba; 

a0=0.0676;  al=0.1352;  a2=0.0676; 

WM.1422;  bl=-0.4124; 
idum  =  -IABS(737496732); 
ranl(&idum); 
o[0][0]=o[l][0]=0.0; 
loopi(710) 
v[i][0]  =  o[i][0]  =  0.; 
loopi(600) 

v[i+50][0]  =  2.0*ranl  (&idum>- 1 .0; 
numvectors  =  700; 
loopj(3)  { 
loopi(700) 

o[i+2][0]=a0*v[i+2][0]+al*v[i+l][0]+a2*v[i][0]+b0*o[i+l][0]+bl*o[i][0]; 
°[0][0]=a0*v[0][0]+al  *v[699][0]+ 

a2*v[698][0]+b0*o[699][0]+bl*o[698][0]; 
o[l][0]=a0*v[l][0)+al  *v[0][0]+ 

a2*v[699][0]+b0*o[0][0]+bl*o[699][0]; 

} 

numinputs-  1; 
numoutputs-  1; 

Time_delayO; 

SaveO; 

exit(0); 

return; 
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Z*********************************************************************** 

ROUTINE  NAME:  ImpulseO 

DESCRIPTION:  Generates  a  series  of  impulse  data  points,  with  the  response 
of  a  Butterworth  filter  associated  with  each  point 
INPUTS:  none 

FUNCTIONS  CALLED:  SaveO 
CALLED  BY:  ButterworthO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

*****•*****•*************************«*************************«*******/ 

void  ImpulseO 

{ 

float  class,a0,al,a2,b0,bl ; 
int  idum=l,ij,bubba; 

80*0.0676;  al=0.1352;  a2=0.0676; 
b0*1.1422;  bl=-0.4124; 
o[0][0]=o[l][0]=0.0; 
loopi(302) 
v[i][0]  =  o[i][0]  =  0.; 
loopi(2) 

v[20+(i)*128][0]  =  1.; 
numvectors  =  300; 
loopj(3)  { 
loopi(300) 

o[i+2][0]=a0*  v[i+2]  [0]+al  *  v[i+ 1  ]  [0]+ 

a2*v[i][0]+b0*o[i+l][0]+bl*o[i][0]; 
o[0][0]=a0*v[0][0]+al  *v[299][0]+ 

a2*v[298][0]+b0*o[299][0]+bl*o[298][0]; 
o[  1  ]  [0]=a0*  v[  1  ]  [0]+al  *  v[0]  [0]+ 

a2*v[299][0]+b0*o[0][0]+b  1  *o[299]  [0]; 

} 

numinputs  =  1; 
numoutputs  =  1; 

Time_delay0; 

SaveO; 

exit(0); 

return; 

} 

/**#♦**♦♦*♦**+*****♦•**♦**♦#*** •*******•****♦♦****#***•**#*#**** *♦#**♦** 

ROUTINE  NAME:  CosinO 

DESCRIPTION:  Generates  a  series  of  data  points  representing  a  cosine  wave, 
with  a  Butterworth  filter  response  associated  with  each  data 
point. 
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INPUTS:  none 

FUNCTIONS  CALLED:  SavcO 
CALLED  BY:  ButtcrworthO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

**********••****••**********************************,*******»„,»»•*„,/ 

void  CosinO 

{ 


float  a0,al,a2,b0,bl; 
intij; 


a0=0.0676;  al=0.1352;  a2=0.0676; 
b0=1.1422;  blss-0.4I24; 

°[0]  [0]=o[  1  ]  [0]=0.0; 
loopi(128) 

v[i][0]  -  cos(2*3.14159*i/64); 
loopi(126) 

o[i+2][0]=a0*v[i+2][0]+al*v[i+l][0]+a2*v[i][0]+b0*o[i+l][0]+bl*o[i][0]; 
numvectors  =  128; 
numinputs  =  1; 
numoutputs  =  1 ; 

Time_delayO; 

SaveO; 

exit(0); 

return; 

} 

/ft********************************************************************** 


ROUTINE  NAME:  StepO 

DESCRIPTION:  Generates  a  step  function  input,  with  the  Butterworth  filter 
response  associated  with  each  data  point 
INPUTS:  none 


FUNCTIONS  CALLED:  SaveO 
CALLED  BY:  ButterworthO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

*********************************************************•*************/ 


void  StepO 

{ 


float  class,aO,al  ,a2,b0,bl ; 
a0=0.0676;  al=0.1352;  a2=0.0676; 
b0=1.1422;  bl=-0.4124; 
loopi(150) 

°[i]  [0]=v[i]  [0]=0.0; 
loopi(30) 
v[i][0]  =  0.; 
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loopi(128) 
v[i+25][0]  =  1.; 
loopi(126) 

o[i+2][03*a0*v[i+2][0]+al*v[i+l][0]+a2*v[i][0]+b0*o[i+l][0]+bl*o[i][0]; 
numvectors  =  128; 
numinputs  =  1; 
numoutputs  =  1; 

Time_dclayO; 

SavcO; 

exit(0); 

return; 


ROUTINE  NAME:  ClearO 

DESCRIPTION:  Clears  the  data  vectors  in  memory,  and  reinitializes  the  vector 
count,  number  of  inputs  and  outputs  to  zero. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_work() 

LAST  UPDATED:  7  May  93  BY:  Jefftey  S.  Dean 

**•*****•***********************************«•*»*****••****»*******,**/ 

void  ClearO 

loopi(num  vectors)  { 
loopj(numinputs) 
v[i]03  =  0.; 
loopj  (numoutputs) 

o[i]D3  =  0.; 

} 

numvectors  =  numvectors  =  0; 
numinputs  =  numoutputs  =  0; 
td  =  TD  =  0; 

SelectO; 

return; 

} 

/ft************ a!***********#************#**#***#**#*#*********** *«***< 

ROUTINE  NAME:  MergeO 

DESCRIPTION:  Allows  the  user  to  replace  the  inputs  or  outputs  in  memory 
with  the  inputs  or  outputs  found  in  a  data  file.  The  number 
of  data  vectors  in  memory  must  match  the  number  of  vectors  in 
the  data  file. 

INPUTS:  none 
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FUNCTIONS  CALLED;  none 
CALLED  BY:  File.workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

*«********•***«***********«***********•******«*******•*****•***********/ 

void  MergeO 

{ 

int  choice, choice2; 

int  num .inputs,  numoutputs; 

float  junk; 

printfC’NnWhat  is  the  name  of  the  file?  :"); 
skipline; 

scanfC%s",  datafile); 
printf("\n"); 

printf("Reading  %s . . .  \n",datafile); 
ifp=fopen(datafile,"r"); 

fecanf(ifp,"%d  %d  %d",&num_inputs,&num_outputs,&numvectors); 
printft"This  file  has  %d  inputs,  %d  outputs, " ,num_inputs,numoutputs); 
printf("and  %d  vectors.  \n  "  ,num  vectors) ; 
if(num_vectors  !=  numvectors&&num_vectors  !=  0)  { 

N0T  gonitnuEH**************^"); 

printf("number  of  vectors  not  the  same  as  current  data\n"); 
return; 

} 

printf("Do  you  want  to  swap  in:  \n"); 
printf("l.  File  inputs\n"); 
printf("2.  File  outputs\n"); 
skipline; 

scanf("%d\  &choice); 
printfl["\n"); 
fskip_line(ifp); 
if(choice=l){ 

printf(',Do  you  want  the  inputs  to  be  used  as:  \nn); 
printf("l.  File  inputs\nN); 
printf("2.  File  outputs\n"); 
skipline; 

scanf("%d",  &choice2); 
printfl["\n"); 
fskip_line(ifp); 
if^choice2=l){ 
numinputs  =  numinputs; 
printfT  loading  data  file  \nn); 
loopi(numvectors)  { 
loopj(numinputs) 
fscanf(ifi>,"%r,&v[i]|j]); 


loopKnum  outputs) 
6ctnf(if^,B%r,&junk); 


iKchoice2— 2){ 
numoutputs  *  numinputs; 
priotfpioadiiig  data  file  \n"); 
k>opi(numvectors)  { 
loopj(numinputs) 

fccan*ifr,"%f,Ao[i]G]); 

loopj  (num_outputs) 
fscani(if^,"%f\&junk); 


if(choice-“2){  \ 

printf("Do  you  want  me  outputs  to  be  used  i 
printf("l.  File  inputs\n"); 
printf("2.  File  outputs^”); 
skipline;  \ 

scanfr%d",ftchoice2)\ 
printfCVn");  \ 

fekip_line(ifp);  \ 

if(choioe2»-l){  \ 

numinputs  *  numoutputs; 
printfpoading  data  fild  outputs  ...\n"); 
loopi(numvectors)  {  ' 

loopj(num_inputs)  \ 
fe<»nftifp,"%r^juak); 
loopj(numoutputs) 
fecanf(ifp,"%r,&v[i)(j]); 

>  \ 

> 

if(choice2=2){ 

numoutputs  “  num_outputs; 
printf("loading  data  file  outputs  ...\nn); 
loopi(numvectors)  { 
loopj(num  inputs) 
&canf(iff,,,%r^junk);  \ 
loopg(numoutputs) 

&canf(ifp,"%r,&o[i][j]); 


fck»se(ifp); 
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return: 

} 

Z****,,***************************************************************** 

ROUTINE  NAME:  StatusO 

DESCRIPTION:  Displays  the  current  number  of  data  vectors,  number  of  inputs 
and  outputs,  and  die  time  delay  of  the  outputs. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

****•*****«***«*******#****************•**********************•********/ 

void  StatusO 

{ 

printf(,,\nVn\n  STATUS  OF  DATA:  \n"); 

printf(”Number  of  vectors:  %d\n",  numvcctors); 

printfT%d  inputs  and  %d  outputs, ",  numinputs,  numoutputs); 

printf(Hwith  a  time  delay  of%d  ticks.\n",  TD); 

printf("\n\n"); 

return; 

} 


ROUTINE  NAME:  Out_types() 

DESCRIPTION:  Prints  out  the  categories  present  in  the  data.  Assumes 
output  integer  represents  range  of  categories. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  Filc_workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

***************•***********************************,*****»*»*»*********/ 

void  OuttypesQ 

{ 

int  set,  types[100]; 

printf("OUTPUT  CATEGORIES  PRESENT  IN  DATA\n"); 
printf("  Which  output  do  you  want  to  check?  (l-%d)  "numoutputs); 
skip  line; 
scanf("%d",  &sel); 
printfTNn"); 

printf("How  many  categories  are  there  in  this  output? "); 

skipline; 

scanfT%d",  &cat); 

printfOn"); 

loopi(cat) 

typesCi]  -  0; 
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k>opi(num_vectors) 
type*((int)o{i][9el- 1  ]]  - 1; 
loopKcat) 

lf(typesCi]—  1)  printf(-%d  "4); 
printiTNaNn"); 
return; 

> 


/•m******************************************************************* 

ROUTINE  NAME:  One_cat0 

DESCRIPTION:  Selects  one  output  or  one  output  group  as  valid,  ail  other 
data  vectors  are  classed  together.  If  file  has  two  outputs, 

(as  found  in  voice  data  files  for  this  project),  routine  asks 
if  data  should  be  broken  into  subgroups  (i.e.  vowels,  nasals, 
etc.)-  If  selected,  the  routine  asks  which  group  is  to  be 
used.  The  routine  then  checks  the  file  Hphon_transT,  which 
lists  all  64  phonemes.  The  number  of  phonemes  in  the  subgroup 
is  determined,  and  the  data  outputs  are  expanded  to  provide 
a  category  for  each  phoneme  in  the  group.  If  the  outputs  are 
not  to  be  broken  into  subgroups  or  there  are  more  than  2  outputs, 
the  routine  assumes  that  the  outputs  represent  categories,  and 
prompt  the  user  to  select  one  output  The  routine  then  creates 
two  output  categories  for  the  data,  one  for  the  selected 
category  and  (me  for  all  other  data  vectors. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 

CALLED  BY:  File_work() 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

****«******************************************************************/ 
void  One_cat0 
{ 

int  index,  junk,  out_vect[64],  ent,  sub,  cont; 

float  cat_data; 

char  choice,  out_vect_s[64][5],  group[10],  phon[10],  sel _group[10]; 

choice  *  NULL; 

if(numoutputs=2)  { 

printf("Do  you  need  it  broken  into  subgroups?  (y/n) "); 
slripjine; 

scanf("%c",  ftchoice); 
printfnn"); 

^choice***1/)  { 

printf(" Which  subgroup  do  you  want?  (0  -  5)\n"); 
skipline; 
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acanlHW",  &sub); 

printfCNn"); 

ifp  -  fbpcn("phon_transr)  V); 
count -0; 
loopi(63)  { 

&canfl(ifp,"%d  %s  %d  %s",&junk,phon,&cat,group); 
if(cat=sub)  { 

strcpy(scl _group,  group); 
out_vect[count]  =  junk; 
strcpy(out_vect_s[count],phon); 
count++; 

} 

} 

loopi(num  vcctors)  { 
cnt  =  0; 

catdata  =  o[i][0]; 
loopj  (count)  { 

if((int)cat_data==out_vect|j])  { 

o[i]Dl  =  i.; 

cnt++; 

} 

else  o[i][j]  =  0.; 

} 

if(cnt=0)  o[i][count]  =  1.; 

} 

numoutputs  =  count+1; 

printfCThe  selected  category  is  %s\n",sel_group); 
loopi(count) 

printf("%d  ",out_vect[i]); 
printf("\n"); 
loopi(count) 

print£("%s  ",out_vect_s[i]); 
printf("\n"); 

> 

} 

else  { 

printf("Whic  i  category  do  you  want?  (1  -  #/od)\n",nuinoutputs); 
skipline; 
scanfCVod",  &sel); 
printfCNn"); 
loopi(num_vectors)  { 
if(o[iJ[sel-l]=l)  { 
o[i][0]=  1.; 

/*  o(i][l]  =  0.;  */ 

} 


else  { 

o£i][0]  *  0.; 

/*  o£i][l]  ®  1.;  */ 

} 

} 

nuxnoutputs  =  1; 

> 

printf("Press  <rctum>  to  continue\n"); 
aldpjine; 

scanf("%c"1  &cont); 
return; 

} 

/«t*M**»*0**«*M****t*******M****M***M******M**M**MMM*******M 

ROUTINE  NAME:  PhonemeO 

DESCRIPTION:  Checks  data  to  determine  which  phonemes  are  present 
INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

*«****************•***•**,*•***************»*»„**,*****,»****„*»»***,/ 

void  PhonemeO 

{ 

int  phon[6][64]; 
char  cont; 

loopi(6) 

loopj(64) 

phon[i][j]  =  0; 


loopi(numvectors) 
phon[(int)o[i][l]][(int)o[i][0]]  =  1; 

printf("Members  of  the  nasal  phoneme  group  are:\n"); 
loopi(64) 

if(phon[0][i]=l)printf("%d  "4); 
printfT’Xn"); 

printf("Members  of  the  vowel  phoneme  group  are:\n"); 
loopi(64) 

if(phon[l][i]— 1)  printf("%d  ",i); 
printf("\n"); 


printf("Members  of  the  stop  phoneme  group  are:\n"); 
loopi(64) 

if(phon[2][i]=l)  printfpM  n,i); 
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•$$V.  ■■■'** 


printfCNn"); 

printf("Members  of  the  fricative  phoneme  group  are:\n"); 
!oopi(64) 

if(phon[3][i]«=l)  printf("%d  "4); 
printfC’Nn"); 


printfl["Members  of  the  silence  phoneme  group  are:\n"); 
loopi(64) 

if(phon[4][i]*=l)printfr%d  "4); 
printf("\n"); 

printf^Members  of  the  liq-glide  phoneme  group  are:\n"); 
loopi(64) 

if(phon(5][i]— 1)  printfT%d  ”4); 
printft"\n"); 

printf^1  Press  <retum>  to  continued"); 
skip_line; 

scanf("%c",  &cont); 
return; 

} 

/**•*********«******************«*************************************** 

ROUTINE  NAME:  CompanO 

DESCRIPTION:  Determines  which  input  value  is  the  largest,  then  check  to  see 
whether  corresponding  output  value  is  a  1.  Used  to  compare  net 
outputs  with  the  desired  outputs;  checks  net  accuracy  for  each 
output  category. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_workO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

********•**************************************************************/ 

void  CompareO 

{ 

float  max; 
int  highout; 
int  score[64][2]; 
char  cont; 

loopi(64) 

loopj(2) 

score[i][j]  =  0; 


if(numinputs!=numoutputs)  { 

printf("DifFerent  number  of  inputs  and  outputs.  Cant  compareAn"); 
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return; 

} 

toopKQum_vectors)  { 
max  *  -1000.; 
loopj(numinputs) 
if(vfl]B]>max)  { 

max  ■  vfiJD]; 

high_out  =j; 

> 

loopj(numinputs) 
mm)  —  1)  «ore[j][0]++; 
ifl[o[i][high_out]  =  1) 
score[high_out]  [  1  ]++; 

} 

loopi(numinputs)  { 

printf("Category  %d:  %d  examples",i,score[i][0]); 
if(score[i][0]>0) 

printfC,  %f»/o%  correct\n",((float)score[i]  [  1  ]/score[i]  [0])*  1 00); 
printfOn"); 

} 

printfCPress  <return>  to  continued"); 
skipline; 

scanfC%c\  &cont); 
return; 

} 

/I********************************************************************** 

ROUTINE  NAME:  Differentiale() 

DESCRIPTION:  Replaces  inputs  in  each  data  vector  with  the  difference  between 
the  inputs,  then  replaces  inputs  in  each  data  vector  with  the 
difference  between  the  input  and  the  next  data  vector  input. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  File_work() 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

***********************************************************************/ 
void  DifferentiateO 
{ 

loopi(numvectors)  { 
loopj(numinputs-l) 

v[i]D]“  v[i]D+l]  -  v[i][j]; 
v[i][0]  =  0.; 

} 


loopi(numvectors-l ) 


loopKmmiinputti) 

vfiJOl  *  v[i+t]|j3  -  v[i][j]; 

loopj(numinputs) 

v[O]0]  *  0.; 

return; 

} 

Z*********************************************************************** 

ROUTINE  NAME:  XorO 

DESCRIPTION:  Creates  XOR  training/ test  data  for  the  neural  net,  with  either  integer  or 
floating  point  values. 

INPUTS:  none 

FUNCTIONS  CALLED:  none 
CALLED  BY:  SelectO 

LAST  UPDATED:  7  May  93  BY:  Jeffrey  S.  Dean 

***•***************************..>««***********************************•*/ 

void  XoiO 

{ 

float  class,  seed; 
int  idum=I,  choice; 

printf("\nEnter  random  number  seed\nn); 
scanf("%d",  &seed); 
idum  =  -IABS(seed); 

printf("\nDo  you  wish:\nl .  Integer  values\n2.  Floating  point  values\n">; 
scanf("%d",  Achoice); 
printfCVn"); 
loopi(1024)  { 
loopj(2)  { 

v[i][j]=ranl(&idum); 
if(choice=l)  { 

tf(v[i]DI>0.5)v[i]DH.0; 
else  v[i]jj]=O.0; 

} 

} 


if  «v[i][0]X).5)  &&  (v[i][l]>0.5)) 
o[i][0]=0.0; 

if  ((y[i][0]<=0.5)  AA  (v[i]  [  1  ]>0. 5)) 
o[i][0]=1.0; 

if  «v[i][0]*).5)  AA  (v[i][l]<=0.5)) 
o[i][0]=1.0; 

if  ((v[i][0]<=0.5)  AA  (v[i][l]<=0.5)) 
o[i][0]=0.0; 
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} 

numoutputs  =  1; 
numvectors  =  1024; 
numinputs  *  2; 

Time_delayO; 

SaveO; 
cxit(0 ); 
return; 

} 

/♦I********************************************************************* 

ROUTINE  NAME:  SequenceO 

DESCRIPTION:  Allows  the  user  to  select  function  for  dealing  with  sequentially  related 
data.  Functions  include: 

1.  Converting  codeword  sequences  (seq  length  codeword  1  codeword2 ...)  into  net 
format 

2.  Randomize  training/test  sequences,  so  that  sequence  categories  are  mixed 

3.  Convert  sequences  with  Fourier  coefficient  inputs  to  net  format 

4.  Read  in  "sequence.dat"  file  and  check  accuracy  of  net  output 
INPUTS:  none 

FUNCTIONS  CALLED:  Convert(),  RandomizeO,  Fourier  input(),  and  ScoreseqO 
CALLED  BY:  SelectO 

LAST  UPDATED:  7  Jan  94  BY:  Jeffrey  S.  Dean 

*****«*#****•**********************************************************/ 

void  SequenceO 

{ 


int  choice; 

numvectors  =  numvectors  =  0.; 
for  (;;)  { 

printf("Choose  one  of  the  following:  \n"); 

printf("\nl.  Convert  codeword  sequences  to  net  format  \n"); 

printf("\n2.  Randomize  sequence  of  codeword  strings  \n"); 

printf("\n3.  Convert  Fourier  magnitude  sequences  to  net  formatVn"); 

printf("\n4.  Score  the  accuracy  of  a  sequence.dat  file\nn); 

scanf("%d",  Achoice); 

printf("\n"); 


if(choicc=l)  ConvertO; 
if(choicesa=s2)  RandomizeO; 
if(choice=3)  Fourier_inputO; 
if(choice=4)  Score  seqO; 

} 

return; 

} 
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/I********************************************************************** 

ROUTINE  NAME:  ConvertO 

DESCRIPTION:  Convert  codeword  sequences  to  net  format 
INPUTS:  none 

FUNCTIONS  CALLED:  File_workO 
CALLED  BY:  Sequencer) 

LAST  UPDATED:  7  Jan  94  BY:  Jeffrey  S.  Dean 

. . . 


void  ConvertO 

{ 

int  t,  categ,  length,  sect; 
float  junk; 

TD  =  0; 

printf("\nWhat  is  the  name  of  the  file? :"); 
skipline; 

scanfC,%s",  datafile); 
printf["\n"); 

printf["\n Which  category  does  this  belong  to?  :N); 
skipline; 

scanf(n%dn,  &categ); 
printf("\n"); 
ifp=fopen(datafile,V); 
t=0; 
loopk(4) 
loopi(50)  { 

fscanf(ifp,  "%d",  &length); 
loopj(length)  { 
fscanf(i^),"%r,  &v[t][0]); 
loopl(6) 
o[t][l)  =  0; 
o[t]  [categ]  =  1; 

t++; 

> 

loopj(6)  { 
v[t][0]  =  -l; 
loopl(6) 
o[t][l]  =  0; 

°[t][0]  - 1; 
t++; 

} 

} 

numvectors  =  t  - 1 ; 
numinputs  =  1; 
numoutputs  -  6; 
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fcloso(ifp); 
File_workO; 
num_vcctors  *  0; 
return; 


} 

/****M*MM*M*MMMM**M«MMOM*MMMM*M 

ROUTINE  NAME:  RandomizeO 

DESCRIPTION:  Randomize  sequences  for  training/test  data 

INPUTS:  none 


FUNCTIONS  CALLED:  File_work() 

CALLED  BY:  SequenceO 

LAST  UPDATED:  7  Jan  94  BY:  Jeffrey  S.  Dean 


V 


void  RandomizeO 

{ 

int  choice,  idum= 15756,  junk,  range,  incr,  max; 
printf("\nWhat  is  the  name  of  the  file? :"); 
skipline; 

scanf("%s",  datafile); 
printfC\n"); 

printf("Reading  %s . . .  \n", datafile); 
ifp=fopen(datafile,"rn); 
fecanffifp,"%d  %d  %d  %d 

%d",&numinputs,&numoutputs,&junk,&numvectors,&TD); 
numvectors  =  num vectors; 
loopi(1000) 
pick[i][0]  =  0; 

loopi(numvectors)  {  /*  Load  in  data  */ 
loopj  (numinputs) 
fscanf(ifp,  "%f '  ,&v2  [i]  [j  ] ); 
loopj(numoutputs) 
fscanf(ifp,"%f\&o2[i][j]); 

} 

incr  =  0; 
pntr[0][0]  =  0; 
count®  1; 

printf("Examining  sequence  starting  positions\nn); 
loopi(numvectors-l)  /*  find  out  where  sequences  start  */ 
if(o2[i+l][0]  !=  1.  &&  o2[i][0]  —  1.)  { 
pntr [count  ][0]  =  i+1; 

f*  printf("%d  %d\n  ",pntr[count][0],  count);  */ 
count++; 

} 
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mast  "■count; 

potrfmax][0]  *  numvectors; 

printfCVnVn"); 

count  *0; 

printf("Randomizing  sequences\n"); 
loopi(max)  { I*  Pick  one  of  sequences  in  file  randomly  */ 
for(;;)  { 

choice  ■  (intXmax*ranl(&idum)); 
if(choice>=i<)&&choice<max) 
if(pick[choicc][0]==0.) 
break; 

> 

pick[choice][0]  =  1.;  /*  Identify  sequence  as  picked  */ 
range  =  pntr[choice+ 1  ]  [0]-pntr[choice]  [0] ; 

/* 

printf(M%d  %d  %d  %d\n",  incr, choice,  pntr[choice][0],  range); 

*/ 

inci++; 

/*  Loop  from  beginning  of  this  sequence  to  next  *1 
loopj(range)  { 
loopk(numinputs) 

v[countJ[k]  =  v2[pntr[choice][0]+j][k]; 
loopk(numoutputs) 

o[count][k]  *  o2[pntr[choice]  [0]+j]  [k] ; 
count++; 

} 

} 

fclose(ifp); 

File_woikOi 

return; 

} 

/*********•****************************»***********•*************»****** 

ROUTINE  NAME:  FourierinputO 

DESCRIPTION:  Reads  in  sequences  of  28  Fourier  amplitude  coefficients  and  converts 
them  to  network  input  format. 

INPUTS:  none 

FUNCTIONS  CALLED:  File_work() 

CALLED  BY:  SequeneeO 

LAST  UPDATED:  7  Jan  94  BY:  Jeffiey  S.  Dean 

***********************************************************************/ 

void  Fourier  inputO 

{ 
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int  Start,  categ,  t; 

printf("\nWhat  is  the  name  of  the  file? 
sldp_line; 

scanfTV*s",  datafile); 
printf(”\n”); 

printfi"\n Which  category  does  this  belong  to?  :H); 
skipjine; 

scanf("%<r,  &categ); 
printf("\n"); 

printf("Reading  %s . . .  \nM  .datafile); 
ifp=*fopen(datafile,"r"); 


Start  =  14; 


t“0; 


loopi(4)  { 
loopj(50)  { 
loopk(Start)  { 
loopl(28) 

fscanfiifp,"%r,&v[t][l]); 

loopl(4) 

°[t]P]  ~  0.; 

o[t][categ]  =  1.; 


loopk(4)  { 
loopl(28) 
v[t]P]  =  0.; 
loopl(4) 
o[t]P]  =  0.; 
°[t][0]  =  1.; 


t++; 

} 

} 

Start +=  2; 

> 

numvectors  =  t; 
numinputs  =  28; 
numoutputs  -  4; 
fclose(ifp); 

File_work0; 

return; 


} 

^•♦••♦♦••♦•••♦•••♦♦♦♦♦♦♦♦•♦♦•♦•♦*#**** *<•**♦***♦* ************************ 
ROUTINE  NAME:  Score_seq0 
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DESCRIPTION:  Scons  network  on  accuracy  in  determining  sequence  category,  based  on 
netwwork  response  on  last  ten  points  each  sequence 
INPUTS:  none 

FUNCTIONS  CALLED:  File  workO 
CALLED  BY:  SequenceO 

LAST  UPDATED:  7  Jan  94  BY:  Jeffrey  S.  Dean 

t**********************************************************************/ 


void  Score  seqO 

{ 

int  check[10],  index,  count,  max,  total[10],  good[10],  numsequence; 

intnumjseq; 

float  score; 

if^open("sequcnce.dat",Y'); 

fscanf(ifp,"%d  %d  %d"  ,&num_inputs,&num_outputs,&num_vectors) ; 
printf^This  file  has  %d  inputs,  %d  outputs,  ”  ,numinputs,num_outputs); 
printfl^and  %d  vectors.\n",num_vectors); 
fekip_line(ifp); 

printf("loading  data  file  ...\n"); 
loopi(num  vectors)  { 

fscanf(ifp,"%r,&v[i][0]); 

fscanftifp,"%r,&o[i][0]); 

if(v[i][0]>91|v[i][0]<0)  printout  of  bounds,  line  %d\n",i); 
if(°[i][0]>9||o[i][0]<0)  printout  of  bounds,  line  %d\n",i); 

} 

fclose(ifp); 

printf("\nData  file  loaded.  Vo"); 
numsequence  =  0; 
loopi(num  vectors)  { 

if(o[i][0]  —  0&&o[i+l][0]!=0) 
num_sequenee++; 

} 

printfL1  There  are  %d  sequences'^”  ,num_sequence) ; 
loopi(lO)  { 

good[i]  =  0; 
total[i] = 0; 

} 

numjseq  =  0; 

printf("Starting  to  process  sequences\n"); 
i  =  0; 

loopk(num  sequence)  { 

whUe(o[i][0]  =  0.)  /*  Move  to  next  sequence  */ 

i++; 

loopj(10)  /*  Zero  count  of  categories  for  sequence  */ 

check[j]  =  0; 
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count  -  0; 

whik(o[i][0] !«  0A&i<num_vcctors)  {  /*  While  in  a  sequence  */ 
i^count>10) 

check[(int)v[i][0]]  4-  1;  /*  Tally  outputs  of  net  */ 
i++;  /*  Increment  to  next  position  */ 

count-H-;  /*  Count  length  of  sequence  */ 

} 


num_seq++; 
max  *  -1; 

loopj(lO)  /*  find  out  which  output  most  often  */ 
if(check[j]>max)  {  /*  chosen  by  net  for  this  sequence  */ 
max  =  checkfj]; 

index  -  j;  /*  Index  is  most  chosen  category  */ 

} 

if(index  —  o[i-l][0])  /*  If  index  is  right  answer  */ 

good  [index] ++;  /*  Show  that  category  was  scored  */ 

total[(int)o[i-l][0]]-H-;  /*  correctly.  Inc.  count  of  category  */ 


} 


count  *  0; 
loopi(lO) 

count  -h=  goodji]; 

score  =  100*  (float)  count/(float)  numsequence; 

printf("The  net  scored  %f%%  of  the  sequences  correctly\n\n",  score); 

loopi(10) 

if(total[i]  >  0)  { 

score  -  100*(float)  good[i]/(float)  total[i]; 

printf("  Category  %d  was  scored  %f%%  correctly\n",  i,  score); 

} 

exit(0); 

return; 

> 

#undcfIC3 

#undefMl 

#undefIAl 

#undefICl 

#undefRMl 

#undefM2 

#undefIA2 

#undefIC2 

#undefRM2 


133 


*mtefM3 

#uadefIA3 

/*  defh  **« 


File  containing  function  declarations  and  variable 
declarations  for  the  main  program  called  create.c. 

date:  1 1  Jun  93 

written  by:  Jeffrey  S.  Dean 

***««******************************************************************/ 

int  *vectorO; 
float  ♦•matrixO; 
int  **imatrixO; 


FILE  *ifp,  *ofp; 

char  str[80],  *datafile[20],  *outfile[20]; 
inti,j,k,cat,  sel; 

int  nnminpiits,  numoutputs,  numnodes,  numvectors,  numvectors,  td,  TD; 

int  numinputs,  numoutputs,  count; 

int  *pick; 

float  **v,  **o; 

void  ButterworthO; 

void  File_wo»kO; 

void  AppendO; 

void  Tinie_delayO; 

void  CategoriesO; 

void  CullO; 

void  NormO; 

void  ViewO; 

void  RandomO; 

void  CosinQ; 

void  StepO; 

void  SaveO; 

void  ImpulseO; 

void  ClearO; 

void  MergeO; 

void  StatusO; 

void  OuttypesO; 

void  One_cat0; 

void  PhonemeO; 

void  CompareO; 

void  DifFerentiatcO; 

void  XoiO; 
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#include  <stdio.h> 
#inchide  <math.h> 


/«»»«•*••»**»*****•«*••»*********( 

FFT.C  •  Fast  Fourier  Transform  program 


^define  loopi(A)  for(i”0;i<(A)U++) 

#define  loopj(A)  for(j=Oj<(A)y-H-) 

#define  loopij(A3)  for  (H);  i<(A);  i++)\ 
for  (j*0;  j<(B);  j-^); 

#defineSQ(A)  (A*  A) 

#definePI  3.1415926 

main(argc,argv) 

intargc; 

char*argvQ; 

{ 

FILE  *fin,  *fout; 
float  *output,*input,*truncout; 
float  norm; 
float  *vectorO; 

/*void  doflipO;*/ 
void  foumQ; 

/♦void  truncateO;*/ 

/♦void  ♦fiee_vector();*/ 
char  name[30]; 

int  ij,  nn[l],  ndim,  isign,  neworder,  order,  image_size; 
if(aigc  !=  3)  { 

printf("!!!  The  command  line  should  be  !!!:\n\n  ffi_trunc  infile  outfile  \n\n"); 
exit(0); 

} 

printf^"!!!  Input  the  input  images  SIZE  and  ORDER: "); 
vanf("%d%d",&iniage_size,&order); 

/♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦set  up  dynamic  allocation*****************/ 


input  *  \nctor(0^*image_size*image_size-l); 
output  =  vector(0,image_size*image_size-l); 
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Set  Up  Files 


V 


/* 


if  ((fin-fopen(aigv[l],"r"))  —  NULL)  { 
printf("I  can't  open  the  input  file"); 
e»t(-l); 

} 

if  ((fout*fopen(argv[2],"w"))  —  NULL){ 
printf("I  cant  open  the  output  file"); 
«dt(-l); 

> 


/*****«***«****Rea(J  pj|c 

loopi(2  *  imagesize*  imagesize- 1 )  /♦  initialize  array  to  zero  */ 
input[i]  *  0.0; 

loopi(image_size*image_size- 1 )  /*read  data  in  the  foum  format  ♦/ 

fscanf(fin,  "%f\n",  &input[i*2]);  /*  see  numerical  recipes  in  c  */ 

fclose(fin);  /♦close  input  file  ♦/ 

/**♦♦*  Initialization  parameters  for  FFT  ♦•******/ 

nn[0]=image_size;  /*  size  of  mput  IAW  foumO  */ 
nn[  1  ]=image_size; 

ndim=l;  /*  one  dim  FFT  ♦/ 

/*ndim=2;  */  /♦  two  dim  FFT  ♦/ 

isign=l;  /♦  FFT  ♦/ 

foum(input- 1 4m- 1 ,ndim,isign); 


/♦♦♦♦♦♦♦♦♦♦♦  Find  Fourier  Magnitude  ***••***•**/ 

j=0; 

for(ias0;i<(2*image_size*image_size-l);  i+=2)  { 
ou4>ut[}]=sqrt((double)SQ(iuput[i])-Kdouble)SQ(input[i+l])); 

J++; 

> 
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norm=output[0] ;  /*  d.c  component  used  for  normalisation  ***/ 
priutfl["%4.0f\n"4K)rm); 

/**•**  normalize  and  write  output  of  FFT  in  argv[2]  file  ****•/ 

loopi(image_sizc*  imagesize)  { 
outputfij^utputfij/nonn; 
fprintf(fout,  "%1.4f\n",  output[i]); 

} 

fclosc(fout); 

/****♦  doflip*******************************/ 

f* doflip(output, imagesize) ;  */  /*  converts  foum  format  to  human  forma 
t*/ 

/*printf("%4.4fin",output[8 1 28]);*/ 

/*******  truncate  ******************************************•*/ 

l*  truncate  takes  fft(output)  of  size(image_size)  and  truncates  the  FFT  to  **/ 
/*  order  specified  plus  d.c.  the  array  is  returned  in  truncout,  the  argv[2]*/ 

/*  is  used  as  a  header  when  truncate  writes  the  output  in  netfft.dat  */ 


/♦  if(order  !=  0){ 

neworder  «  2  *  order*- 1 ; 
truncout  =  vector(0,image_size*image_size-l); 
truncate(output,image_size,order,trunc_out,  argv[2]); 
fiee_vector(trunc_out,0,image  size*image_size-l); 

> 

free_vector(input,0  2  *  imagesize  *  imagesize- 1 ); 
free_vector(output,0,image_size*  imagesize- 1 ); 


} 
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/***************«*************«*************************< 

NAME:  foum.c 

DESCRIPTION:  Numerical  Recipies  multi  dimensional  FFT  routine. 
Requires  a  complex  column  vector  as  follows: 

/  real  a(iy 
/  complex  a(iy 
/reala(2)/ 

/  complex  a(2)/ 

/etc/ 

SUBROUTINES  CALLED: 

WRITTEN  BY:  Numerical  Recipies  in  C 


*********************************************************************/ 

#include  <math.h> 

#define  SWAP(a,b)  tempr=(a);(a>=(b);(b)=tempr 

void  foum(data,nn,ndim,isign) 

float  dataQ; 

int  nn[],ndim,isign; 

{ 

int  i  1  ,i2,i3  ,i2rev,i3rev,ip  1  ,ip2,ip3  ,ifp  1  ,ifp2 ; 
int  ibit,idimjd  Jc2,n,nprev,nrem,ntot; 
float  tempi,tempr; 
double  theta, wi,wpi,wpr,wr,wtemp; 

ntbt=l; 

for  (idim=l  ;idim<==ndim;idim-H-) 
ntot  *=  nn[idim]; 
nprev=l; 

for  (idim=ndim;idim>=  1  ;idim-)  { 
n=nn[idim]; 
nrem=ntot/(n*nprev); 
ipl=siq)rev«  1; 
ip2=ipl*n; 
ip3=ip2*nrem; 
i2rev=l; 

for(i2=l;i2<=ip2;i2+=ipl)  { 
if  (i2  <  i2rev)  { 

for  (il=i2;il<=i2+ipl-2;il+=2)  { 
for  (i3=il;i3<=ip3;i3+=ip2)  { 
i3rev=i2rev+i3-i2; 

SWAP(data[i3],data[i3rev]); 
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S  W  AP(data[i3+ 1  ],data[i3rev+ 1  ]); 

} 

} 

} 

ibit*ip2  »  1; 

while  (ibit  >*  ipl  &&  i2tev  >  ibit)  { 
i2rev  -=  ibit; 
ibit  »=  1; 

} 

i2rev  +=  ibit; 

> 

ifpl-ipl; 

while  (ifpl  <  ip2)  { 
if^2=ifpl  « 1; 

theta=isign*6.283 1853071 7959/(ifp2/ipl); 

wtemp=sin(0.5*theta); 

wpr  =  -2.0*wtemp*wtemp; 

wpi=sin(theta); 

wr*1.0; 

wi^O.O* 

for  (i3=l  ;i3<=ifpl  ;i3+=ipl)  { 

for(il==i3;il<!=!i3+ipl-2;il+=2)  { 
for  (i2=il  ;i2<=rip3;i2-H=ifp2)  { 
kl=i2; 
k2=kl+ifpl; 

tempr=wr*  data[k2}-wi*  data[k2+ 1 J; 
tempi=wr*data[k2+l  ]+wi*data[k2]; 
data(k2]=data[kl]-tempr; 
data[k2+l]=data[kl+l]-tempi; 
datafkl]  +*  tempr; 
data[kl+l]  +=  tempi; 

} 

} 

wr=(wtemp=wr)*wpr-wi*wpi+wr, 
wi=wi  *  wpr+wtemp*  wpi+wi ; 

> 

ifpl=ifp2; 

} 

nprev  *=  n; 

} 

} 


#undef  SWAP 


Appendix  D.  Payton  Auditory  Model 


One  of  the  functions  that  is  always  cited  as  a  potential  use  for  recurrent  neural 
networks  is  speech  analysis.  Because  of  the  grammatical  rules  inherent  in  language, 
speech  naturally  has  a  sequential  structure  that  can  be  learned  by  a  recurrent  network, 
which  can  learn  temporal  probabilities  as  well  as  the  spatial  (frequency)  probabilities 
calculated  by  a  standard  backprop  net.  The  speech  data  used  to  train  the  net  can  be 
generated  in  several  ways.  One  standard  method  is  to  Fast  Fourier  Transform  (FFT)  the 
sampled  and  digitized  speech,  and  use  the  Fourier  coefficients  as  the  training  data  for  the 
network.  Variations  on  this  approach  include  using  Cepstral,  Discrete  Cosine  Transform 
(DCT)  or  wavelet  coefficients.  All  of  these  approaches  are  based  on  transform  algorithms 
that  convert  the  temporal  domain  information  into  a  frequency  domain.  Each  of  these 
approaches  have  their  advantages  and  disadvantages. 

In  the  same  way  that  neural  network  designs  are  inspired  by  how  neurons  work  in 
living  systems,  many  researchers  have  been  trying  to  emulate  the  way  in  which  the  hearing 
systems  in  animals  process  sound  energy  into  information  encoded  in  the  auditory  nerves. 
The  acoustical  mechanics  of  the  ear  allow  us  to  pick  out  one  voice  among  many,  to  make 
sense  out  of  the  series  of  vowels  and  consonants  we  hear  with  relatively  high  reliability. 
The  ear  works  in  a  very  non-linear  way  to  pick  out  the  formants,  or  peak  frequency  points, 
which  are  critical  in  understanding  speech. 

The  Payton(8)  auditory  model  is  one  of  many  algorithms(5)(8)  that  model  the  way 
in  which  our  auditory  systems  convert  sound  into  neural  impulses.  This  model  produces 
20  outputs,  that  correspond  to  the  predicted  activity  of  20  cochlear  neurons  that  carry 
sound  information  to  the  brain  in  a  living  mammal. 
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Figure  31:  A  ptot  of  voice  data  preprocessed  by  the  Payton  algorithm.  Note  the  peaks  in 
the  data  representing  the  speech  formants. 
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